投稿時間:2023-04-07 02:21:22 RSSフィード2023-04-07 02:00 分まとめ(27件)

カテゴリー等 サイト名等 記事タイトル・トレンドワード等 リンクURL 頻出ワード・要約等/検索ボリューム 登録日
IT 気になる、記になる… Twitter Blueの加入特典「広告数の半減」、ようやく提供開始 https://taisy0.com/2023/04/07/170429.html twitter 2023-04-06 16:08:57
AWS AWS Government, Education, and Nonprofits Blog Supporting health equity with data insights and visualizations using AWS https://aws.amazon.com/blogs/publicsector/supporting-health-equity-data-insights-visualizations-aws/ Supporting health equity with data insights and visualizations using AWSIn this guest post Ajay K Gupta co founder and chief executive officer CEO of HSR health explains how healthcare technology HealthTech nonprofit HSR health uses geospatial artificial intelligence and AWS to develop solutions that support improvements in healthcare and health equity around the world 2023-04-06 16:29:18
AWS AWS How do I launch an Amazon RDS database instance that's covered by the AWS Free Tier? https://www.youtube.com/watch?v=RFEhB9rM7cE How do I launch an Amazon RDS database instance that x s covered by the AWS Free Tier Skip directly to the demo For more details see the Knowledge Center article with this video Vetrivel shows you how to launch an Amazon RDS database instance that s covered by the AWS Free Tier Introduction Additional information Create an RDS database instance ClosingSubscribe More AWS videos More AWS events videos ABOUT AWSAmazon Web Services AWS is the world s most comprehensive and broadly adopted cloud platform offering over fully featured services from data centers globally Millions of customers ーincluding the fastest growing startups largest enterprises and leading government agencies ーare using AWS to lower costs become more agile and innovate faster AWS AmazonWebServices CloudComputing 2023-04-06 16:26:39
AWS AWS What do I need to know if my Free Tier period with AWS is expiring? https://www.youtube.com/watch?v=OarU0LRFWDc What do I need to know if my Free Tier period with AWS is expiring Skip directly to the demo For more details see the Knowledge Center article with this video Jean Carlos shows you what you need to know if your Free Tier period with AWS is expiring Introduction Additional information Check what resources are active Remove resources you do not want ClosingSubscribe More AWS videos More AWS events videos ABOUT AWSAmazon Web Services AWS is the world s most comprehensive and broadly adopted cloud platform offering over fully featured services from data centers globally Millions of customers ーincluding the fastest growing startups largest enterprises and leading government agencies ーare using AWS to lower costs become more agile and innovate faster AWS AmazonWebServices CloudComputing 2023-04-06 16:21:57
AWS AWS - Webinar Channel Analytics in 15: Ingest, Manage, and Analyze Data with Central Governance- AWS Analytics in 15 https://www.youtube.com/watch?v=Upb78zcijG0 Analytics in Ingest Manage and Analyze Data with Central Governance AWS Analytics in If your organization shares data especially in large scale environments with multiple producer and consumer clusters this is a session you don t want to miss Within minutes you will learn how Amazon Redshift can help you ingest streaming data auto copy data from S and centrally manage data across multiple data shares all without manual scripting or complex querying Learning Objectives Objective See how data sharing access control with AWS Lake Formation works in action Objective Learn how to centrally manage granular access to data across all consuming data services Objective Understand how Amazon Redshift Amazon S AWS Lake Formation and AWS streaming services work together to help you ingest manage and analyze data To learn more about the services featured in this talk please visit To download a copy of the slide deck from this webinar visit 2023-04-06 16:15:01
python Pythonタグが付けられた新着投稿 - Qiita ちゅらデータ菱沼さんから学ぶSQLの基本 https://qiita.com/Zakuro890/items/1b0c61e29f5549e9cf1c 講師 2023-04-07 01:36:35
Git Gitタグが付けられた新着投稿 - Qiita git commitで「dquote>」と表示されたときの解決方法 https://qiita.com/muuuuu/items/5c426bac4480beff1626 mitmxxxxyyyyzzzzdquotegt 2023-04-07 01:18:53
海外TECH MakeUseOf Will the Microsoft and Activision/Blizzard Deal Be Good or Bad for Gamers? https://www.makeuseof.com/is-microsoft-and-activisionblizzard-deal-good-or-bad-for-gamers/ Will the Microsoft and Activision Blizzard Deal Be Good or Bad for Gamers The Microsoft and Activision Blizzard deal is the biggest video game acquisition in history But is this really beneficial to gamers Let s find out 2023-04-06 16:30:17
海外TECH MakeUseOf How to Insert Special Characters with Custom Keyboard Shortcuts in Windows 10 & 11 https://www.makeuseof.com/insert-special-characters-keyboard-shortcuts-windows/ How to Insert Special Characters with Custom Keyboard Shortcuts in Windows amp Forget Alt codes here s how to make inserting special characters easier to remember with a custom Windows shortcut 2023-04-06 16:15:16
海外TECH DEV Community Integrate Apache Spark and QuestDB for Time-Series Analytics https://dev.to/glasstiger/integrate-apache-spark-and-questdb-for-time-series-analytics-3i3n Integrate Apache Spark and QuestDB for Time Series AnalyticsSpark is an analytics engine for large scale data engineering Despite its long history it still has its well deserved place in the big data landscape QuestDB on the other hand is a time series database with a very high data ingestion rate This means that Spark desperately needs data a lot of it and QuestDB has it a match made in heaven Of course there is pandas for data analytics The key here is the expression large scale Unlike pandas Spark is a distributed system and can scale really well What does this mean exactly Let s take a look at how data is processed in Spark For the purposes of this article we only need to know that a Spark job consists of multiple tasks and each task works on a single data partition Tasks are executed parallel in stages distributed on the cluster Stages have a dependency on the previous ones tasks from different stages cannot run in parallel The schematic diagram below depicts an example job In this tutorial we will load data from a QuestDB table into a Spark application and explore the inner working of Spark to refine data loading Finally we will modify and save the data back to QuestDB Loading data to SparkFirst thing first we need to load time series data from QuestDB I will use an existing table trades with just over million rows It contains bitcoin trades spanning over days not exactly a big data scenario but good enough to experiment The table contains the following columns Column NameColumn TypesymbolSYMBOLsideSYMBOLpriceDOUBLEamountDOUBLEtimestampTIMESTAMPThe table is partitioned by day and the timestamp column serves as the designated timestamp QuestDB accepts connections via Postgres wire protocol so we can use JDBC to integrate You can choose from various languages to create Spark applications and here we will go for Python Create the script sparktest py from pyspark sql import SparkSession create Spark sessionspark SparkSession builder appName questdb test getOrCreate load trades table into the dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable trades load print the number of rowsprint df count do some filtering and print the first rows of the datadf df filter df symbol BTC USD filter df side buy df show truncate False Believe it or not this tiny application already reads data from the database when submitted as a Spark job spark submit jars postgresql jar sparktest pyThe job prints the following row count And these are the first rows of the filtered table symbol side price amount timestamp BTC USD buy E BTC USD buy E BTC USD buy E only showing top rowsAlthough sparktest py speaks for itself it is still worth mentioning that this application has a dependency on the JDBC driver located in postgresql jar It cannot run without this dependency hence it has to be submitted to Spark together with the application Optimizing data loading with SparkWe have loaded data into Spark Now we will look at how this was completed and some aspects to consider The easiest way to peek under the hood is to check QuestDB s log which should tell us how Spark interacted with the database We will also make use of the Spark UI which displays useful insights of the execution including stages and tasks Spark connection to QuestDB Spark is lazyQuestDB log shows that Spark connected three times to the database For simplicity I only show the relevant lines in the log T Z I pg server connected ip fd T Z I i q c p PGConnectionContext parse fd q SELECT FROM trades WHERE T Z I pg server disconnected ip fd src queue Spark first queried the database when we created the DataFrame but as it turns out it was not too interested in the data itself The query looked like this SELECT FROM trades WHERE The only thing Spark wanted to know was the schema of the table in order to create an empty DataFrame Spark evaluates expressions lazily and only does the bare minimum required at each step After all it is meant to analyze big data so resources are incredibly precious for Spark Especially memory data is not cached by default The second connection happened when Spark counted the rows of the DataFrame It did not query the data this time either Interestingly instead of pushing the aggregation down to the database by running SELECT count FROM trades itjust queried a for each record SELECT FROM trades Spark adds the s together to get the actual count T Z I pg server connected ip fd T Z I i q c p PGConnectionContext parse fd q SELECT FROM trades T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I pg server disconnected ip fd src queue Working with the data itself eventually forced Spark to get a taste of the table s content too Filters are pushed down to the database by default T Z I pg server connected ip fd T Z I i q c p PGConnectionContext parse fd q SELECT symbol side price amount timestamp FROM trades WHERE symbol IS NOT NULL AND side IS NOT NULL AND symbol BTC USD AND side buy T Z I pg server disconnected ip fd src queue We can see that Spark s interaction with the database is rather sophisticated and optimized to achieve good performance without wasting resources The Spark DataFrame is the key component which takes care of the optimization and it deserves some further analysis What is a Spark DataFrame The name DataFrame sounds like a container to hold data but we have seen it earlier that this is not really true So what is a Spark DataFrame then One way to look at Spark SQL with the risk of oversimplifying it is that it is a query engine df filter predicate is really just another way of sayingWHERE predicate With this in mind the DataFrame is pretty much a query or actually more like a query plan Most databases come with functionality to display query plans and Spark has it too Let s check the plan for the above DataFrame we just created df explain extended True Parsed Logical Plan Filter side buy Filter symbol BTC USD Relation symbol side price amount timestamp JDBCRelation trades numPartitions Analyzed Logical Plan symbol string side string price double amount double timestamp timestampFilter side buy Filter symbol BTC USD Relation symbol side price amount timestamp JDBCRelation trades numPartitions Optimized Logical Plan Filter isnotnull symbol AND isnotnull side AND symbol BTC USD AND side buy Relation symbol side price amount timestamp JDBCRelation trades numPartitions Physical Plan Scan JDBCRelation trades numPartitions symbol side price amount timestamp PushedFilters IsNotNull symbol IsNotNull side EqualTo symbol BTC USD EqualTo side buy ReadSchema struct lt symbol string side string price double amount double timestamp timestamp gt If the DataFrame knows how to reproduce the data by remembering the execution plan it does not need to store the actual data This is precisely what we have seen earlier Spark desperately tried not to load our data but this can have disadvantages too Caching dataNot caching the data radically reduces Spark s memory footprint but there is a bit of jugglery here Data does not have to be cached because the plan printed above can be executed again and again and again Now imagine how a mere decently sized Spark cluster could make our lonely QuestDB instance suffer martyrdom With a massive table containing many partitions Spark would generate a large number of tasks to be executed parallel across different nodes of the cluster These tasks would query the table almost simultaneously putting a heavy load on the database So if you find your colleagues cooking breakfast on your database servers consider forcing Spark to cache some data to reduce the number of trips to the database This can be done by calling df cache In a large application it might require a bit of thinking about what is worth caching and how to ensure that Spark executors have enough memory to store the data In practice you should consider caching smallish datasets used frequently throughout the application s life Let s rerun our code with a tiny modification adding cache from pyspark sql import SparkSession create Spark sessionspark SparkSession builder appName questdb test getOrCreate load trades table into the dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable trades load cache print the number of rowsprint df count print the first rows of the datadf show truncate False This time Spark hit the database only twice First it came for the schema the second time for the data SELECT symbol side price amount timestamp FROM trades T Z I pg server connected ip fd T Z I i q c p PGConnectionContext parse fd q SELECT FROM trades WHERE T Z I pg server disconnected ip fd src queue T Z I pg server connected ip fd T Z I i q c p PGConnectionContext parse fd q SELECT symbol side price amount timestamp FROM trades T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I pg server disconnected ip fd src queue Clearly even a few carefully placed cache calls can improve the overall performance of an application sometimes significantly What else should we take into consideration when thinking about performance Earlier we mentioned that our Spark application consists of tasks which are working on the different partitions of the data parallel So partitioned data mean parallelism which results in better performance Spark data partitioningNow we turn to the Spark UI It tells us that the job was done in a single task The truth is that we have already suspected this The execution plan told us numPartitions and we did not see any parallelism in the QuestDB logs either We can display more details about this partition with a bit of additional code from pyspark sql functions import spark partition id min max countdf df withColumn partitionId spark partition id df groupBy df partitionId agg min df timestamp max df timestamp count df partitionId show truncate False partitionId min timestamp max timestamp count partitionId The UI helps us confirm that the data is loaded as a single partition QuestDB stores this data in partitions We should try to fix this Although it is not recommended we can try to use DataFrame repartition This call reshuffles data across the cluster while partitioning the data so it should be our last resort After running df repartition df timestamp we see partitions but not exactly the way we expected The partitions overlap with one another partitionId min timestamp max timestamp count partitionId It seems that DataFrame repartition used hashes to distribute the rows across the partitions This would mean that all tasks would require data from all QuestDB partitions Let s try this instead df repartitionByRange timestamp partitionId min timestamp max timestamp count partitionId This looks better but still not ideal That is because DaraFrame repartitionByRange samples the dataset and then estimates the borders of the partitions What we really want is for the DataFrame partitions to match exactly the partitions we see in QuestDB This way the tasks running parallel in Spark do not cross their way in QuestDB likely to result in better performance Data source options are to the rescue Let s try the following from pyspark sql import SparkSessionfrom pyspark sql functions import spark partition id min max count create Spark sessionspark SparkSession builder appName questdb test getOrCreate load trades table into the dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable trades option partitionColumn timestamp option numPartitions option lowerBound T Z option upperBound T Z load df df withColumn partitionId spark partition id df groupBy df partitionId agg min df timestamp max df timestamp count df partitionId show truncate False partitionId min timestamp max timestamp count partitionId After specifying partitionColumn numPartitions lowerBound and upperBound the situation is much better the row counts in the partitions match what we have seen in the QuestDB logs earlier rowCount rowCount and rowCount Looks like we did it T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount T Z I i q c TableReader open partition Users imre Work dbroot db trades rowCount partitionNameTxn transientRowCount partitionIndex partitionCount We can check Spark UI again it also confirms that the job was completed in separate tasks each of them working on a single partition Sometimes it might be tricky to know the minimum and maximum timestamps when creating the DataFrame In the worst case you could query the database for those values via an ordinary connection We have managed to replicate our QuestDB partitions in Spark but data does not always come from a single table What if the data required is the result of a query Can we load that and is it possible to partition it Options to load data SQL query vs tableWe can use the query option to load data from QuestDB with the help of a SQL query minute aggregated trade datadf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option query SELECT symbol sum amount as volume min price as minimum max price as maximum round max price min price as mid timestamp as ts FROM trades WHERE symbol BTC USD SAMPLE BY m ALIGN to CALENDAR load Depending on the amount of data and the actual query you might find that pushing the aggregations to QuestDB is faster than completing it in Spark Spark definitely has an edge when the dataset is really large Now let s try partitioning this DataFrame with the options used before with the option dbtable Unfortunately we will get an error message Options query and partitionColumn can not be specified together However we can trick Spark by just giving the query an alias name This means we can go back to using the dbtable option again which lets us specify partitioning See the example below from pyspark sql import SparkSessionfrom pyspark sql functions import spark partition id min max count create Spark sessionspark SparkSession builder appName questdb test getOrCreate load minute aggregated trade data into the dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable SELECT symbol sum amount as volume min price as minimum max price as maximum round max price min price as mid timestamp as ts FROM trades WHERE symbol BTC USD SAMPLE BY m ALIGN to CALENDAR AS fake table option partitionColumn ts option numPartitions option lowerBound T Z option upperBound T Z load df df withColumn partitionId spark partition id df groupBy df partitionId agg min df ts max df ts count df partitionId show truncate False partitionId min ts max ts count partitionId Looking good Now it seems that we can load any data from QuestDB into Spark by passing a SQL query to the DataFrame Do we really Our trades table is limited to three data types only What about all the other types you can find in QuestDB We expect that Spark will successfully map a double or a timestamp when queried from the database but what about a geohash It is not that obvious what is going to happen As always when unsure we should test Type mappingsI have another table in the database with a different schema This table has a column for each type currently available in QuestDB CREATE TABLE all types symbol SYMBOL string STRING char CHAR long LONG int INT short SHORT byte BYTE double DOUBLE float FLOAT bool BOOLEAN uuid UUID long LONG long LONG bin BINARY gc GEOHASH c date DATE timestamp TIMESTAMP timestamp timestamp PARTITION BY DAY INSERT INTO all types values sym str a true fb df dd ab cd to long rnd long rnd bin rnd geohash to date yyyy MM dd to timestamp T yyyy MM ddTHH mm ss SSS long is not fully supported by QuestDB yet so it is commented out Let s try to load and print the data we can also take a look at the schema of the DataFrame from pyspark sql import SparkSession create Spark sessionspark SparkSession builder appName questdb test getOrCreate create dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable all types load print the schemaprint df schema print the content of the dataframedf show truncate False Much to my surprise Spark managed to create the DataFrame and mapped all types Here is the schema StructType StructField symbol StringType True StructField string StringType True StructField char StringType True StructField long LongType True StructField int IntegerType True StructField short ShortType True StructField byte ShortType True StructField double DoubleType True StructField float FloatType True StructField bool BooleanType True StructField uuid StringType True StructField long StringType True StructField long StringType True StructField bin BinaryType True StructField gc StringType True StructField date TimestampType True StructField timestamp TimestampType True It looks pretty good but you might wonder if it is a good idea to map long and geohash types to String QuestDB does not provide arithmetics for these types so it is not a big deal Geohashes are basically base numbers represented and stored in their string format The bit long values are also treated as string literals long is used to store cryptocurrency private keys Now let s see the data symbol string char long int short byte double float bool uuid sym str a true fb df dd ab cd long bin xeecaaccedbbbdbcbbeecdefaec F D B F C A A C D A C B C gc date timestamp qk It also looks good but we could omit the from the end of the date field We can see that it is mapped to Timestamp and not Date We could also try to map one of the numeric fields to Decimal This can be useful if later we want to do computations that require high precision We can use the customSchema option to customize the column types Our modified code from pyspark sql import SparkSession create Spark sessionspark SparkSession builder appName questdb test getOrCreate create dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable all types option customSchema date DATE double DECIMAL load print the schemaprint df schema print the content of the dataframedf show truncate False The new schema StructType StructField symbol StringType True StructField string StringType True StructField char StringType True StructField long LongType True StructField int IntegerType True StructField short ShortType True StructField byte ShortType True StructField double DecimalType True StructField float FloatType True StructField bool BooleanType True StructField uuid StringType True StructField long StringType True StructField long StringType True StructField bin BinaryType True StructField gc StringType True StructField date DateType True StructField timestamp TimestampType True And the data is displayed as symbol string char long int short byte double float bool uuid sym str a true fb df dd ab cd long bin xeecaaccedbbbdbcbbeecdefaec F D B F C A A C D A C B C gc date timestamp qk It seems that Spark can handle almost all database types The only issue is long but this type is a work in progress currently in QuestDB When completed it will be mapped as String just like long Writing data back into the databaseThere is only one thing left writing data back into QuestDB In this example first we will load some data from the database and add two new features minute moving averagestandard deviation also calculated over the last minute windowThen we will try to save the modified DataFrame back into QuestDB as a new table We need to take care of some type mappings as Double columns are sent as FLOAT to QuestDB by default so we end up with this code from pyspark sql import SparkSessionfrom pyspark sql window import Windowfrom pyspark sql functions import avg stddev when create Spark sessionspark SparkSession builder appName questdb test getOrCreate load minute aggregated trade data into the dataframedf spark read format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable SELECT symbol sum amount as volume round max price min price as mid timestamp as ts FROM trades WHERE symbol BTC USD SAMPLE BY m ALIGN to CALENDAR AS fake table option partitionColumn ts option numPartitions option lowerBound T Z option upperBound T Z load add new featureswindow Window partitionBy df symbol rowsBetween Window currentRow df df withColumn ma avg df mid over window df df withColumn std stddev df mid over window df df withColumn std when df std isNull otherwise df std save the data as trades enriched df write format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable trades enriched option createTableColumnTypes volume DOUBLE mid DOUBLE ma DOUBLE std DOUBLE save All works but…we soon realize that our new table trades enriched is not partitioned and does not have a designated timestamp which is not ideal Obviously Spark has no idea of QuestDB specifics It would work better if we created the table upfront and Spark only saved the data into it We drop the table and re create it this time it is partitioned and has a designated timestamp DROP TABLE trades enriched CREATE TABLE trades enriched volume DOUBLE mid DOUBLE ts TIMESTAMP ma DOUBLE std DOUBLE timestamp ts PARTITION BY DAY The table is empty and waiting for the data We rerun the code all works no complaints The data is in the table and it is partitioned One aspect of writing data into the database is if we are allowed to create duplicates What if I try to rerun the code again without dropping the table Will Spark let me save the data this time No we get an error pyspark sql utils AnalysisException Table or view trades enriched already exists SaveMode ErrorIfExists The last part of the error message looks interesting SaveMode ErrorIfExists What is SaveMode It turns out we can configure what should happen if the table already exists Our options are errorifexists the default behavior is to return an error if the tablealready exists Spark is playing safe hereappend data will be appended to the existing rows already present in thetableoverwrite the content of the table will be replaced entirely by the newlysaved dataignore if the table is not empty our save operation gets ignored withoutany errorWe have already seen how errorifexists behaves append and ignore seem to be simple enough just to work However overwrite is not straightforward The content of the table must be cleared before the new data can be saved Spark will delete and re create the table by default which means losing partitioning and the designated timestamp In general we do not want Spark to create tables for us Luckily with the truncate option we can tell Spark to use TRUNCATE to clear the table instead of deleting it save the data as trades enriched overwrite if already existsdf write format jdbc option url jdbc postgresql localhost questdb option driver org postgresql Driver option user admin option password quest option dbtable trades enriched option truncate True option createTableColumnTypes volume DOUBLE mid DOUBLE ma DOUBLE std DOUBLE save mode overwrite The above works as expected ConclusionOur ride might seem bumpy but we finally have everythingworking Our new motto should be There is a config option for everything To summarize what we have found We can use Spark s JDBC data source to integrate with QuestDB It is recommended to use the dbtable option even if we use a SQL query to load data Always try to specify partitioning options partitionColumn numPartitions lowerBound and upperBound when loading data partitions ideally should match with the QuestDB partitions Sometimes it makes sense to cache some data in Spark to reduce the number of trips to the database It can be beneficial to push work down into the database depending on the task and how much data is involved It makes sense to make use of QuestDB s time series specific features such as SAMPLE BY instead of trying to rewrite it in Spark Type mappings can be customized via the customSchema option when loading data When writing data into QuestDB always specify the desired saving mode Generally works better if you create the table upfront and do not let Spark create it because this way you can add partitioning and designated timestamp If selected the overwrite saving mode you should enable the truncate option too to make sure Spark does not delete the table hence partitioning and the designated timestamp will not get lost Type mappings can be customized via the createTableColumnTypes option when saving data I mentioned only the config options which are most likely to be tweaked when working with QuestDB the complete set of options can be found here Spark data source options What could the future bring Overall everything works but it would be nice to have a much more seamless way of integration where partitioning would be taken care of automagically Some type mappings could use better defaults too when saving data into QuestDB Theoverwrite saving mode could default to use TRUNCATE More seamless integration is not impossible to achieve If QuestDB provided its own JDBCDialect implementation for Spark the above nuances could be handled We should probably consider adding this Finally there is one more thing we did not mention yet data locality That is because currently QuestDB cannot run as a cluster However we are actively working on a solution check out The Inner Workings of Distributed Databasesfor more information When the time comes we should ensure that data locality is also considered Ideally each Spark node would work on tasks that require partitions loaded from the local or closest QuestDB instance However this is not something we should be concerned about at this moment for now just enjoy data crunching 2023-04-06 16:40:46
海外TECH DEV Community Exploring Google Zanzibar: A Demonstration of Its Basics https://dev.to/egeaytin/exploring-google-zanzibar-a-demonstration-of-its-basics-b1h Exploring Google Zanzibar A Demonstration of Its BasicsWelcome to our latest article on Google Zanzibar For those who haven t heard of it yet Zanzibar is the authorization system used by Google to handle authorization for hundreds of its services and products including YouTube Drive Calendar Cloud and Maps With the ability to handle complex authorization policies at a global scale Zanzibar processes billions of access control lists and queries per second In this article we will take a closer look at Zanzibar by implementing some of its fundamentals Specifically we will begin by exploring the Zanzibar data model and ReBAC Relationship Based Access Control Next we will create relational tuples which are analogous to ACL Access Control List style authorization data in Zanzibar We will then proceed to Zanzibar APIs and examine how Zanzibar handles modeling Please keep in mind that code blocks in this article are for demonstration purposes only and are not intended for production usage If you re looking for a fully fledged implementation or a centralized authorization solution that uses the Zanzibar permission model we ve got you covered We re currently building an open source authorization service inspired by Google Zanzibar which you can check out on our github repo To begin it is important to understand the data model of Zanzibar as it differs significantly from legacy authorization structures Zanzibar Data ModelDespite popular access control models such as Role Based Access Control RBAC and Attribute Based Access Control ABAC Zanzibar relies on relationship based access control which takes into account the relationships between resources and entities rather than solely relying on roles or attributes Now you might be wondering what are these relationships and how does Google Zanzibar utilize them to create complex authorization policies and enforce them efficiently What is relational based access control ReBAC exactly Relationship Based Access Control ReBAC is an access control model that takes into account relationships between subjects and objects when determining access permissions ReBAC extends traditional access control models by considering social or organizational relationships such as hierarchical or group based relationships in addition to the standard user to resource relationships For example ReBAC might allow a manager to access files of subordinates in their team or permit members of a certain project team to access specific resources associated with that project By incorporating relationship based factors ReBAC can provide a more nuanced and flexible access control mechanism that better reflects real world social structures and can help organizations to enforce security policies more effectively To give you a simple example let s take a look at an endpoint responsible for updating a given document Define a Document modelconst Document sequelize define Document id type DataTypes INTEGER primaryKey true autoIncrement true title type DataTypes STRING allowNull false content type DataTypes TEXT allowNull false Update a document by IDapp put documents id async req res gt const id parseInt req params id const document await Document findByPk id if document return res status send Document not found const title content req body await document update title content res send document In this example we re using Sequelize ORM to update a document in a PostgreSQL database on an express js endpoint As you might have noticed we re not checking any kind of access rights at the moment That means anybody can edit any document which isn t ideal right We can easily fix this by adding an authorization check Let s say that only the user who owns the document can edit it Update a document by IDapp put documents id async req res gt const id parseInt req params id const document await Document findByPk id if document return res status send Document not found Check if the user is authorized to edit the document if document owner req user id return res status send Unauthorized const title content req body await document update title content res send document With that we have added a relational access control in its simplest form ownership Not surprisingly in real world applications that probably won t be the only access control check you should make If we take a step back and look at this example from an organizational perspective we can see that there are probably more access rules to consider beyond just ownership For instance organizational roles such as admin or manager roles may need the ability to edit any document that belongs to their organization In addition resources like documents can have parent child relationships with various user groups and teams each with different levels of entitlements Things can get pretty complicated when we combine a bunch of access control rules and make the authorization structure more fine grained But no worries that s where Zanzibar comes in to save the day and make our lives a whole lot easier As we previously mentioned Zanzibar leverages relationships between users and resources to provide a powerful and adaptable approach to access control By taking into account these relationships Zanzibar is able to provide a fine grained level of control and flexibility that can be customized to fit a wide range of use cases Let s look at how Zanzibar leverages relations in more depth Zanzibar Relation TuplesIn Zanzibar access control policies are expressed as relations which are essentially tables that define the relationship between principals resources and permissions Each relation is defined by a schema that specifies the names and types of its columns The basic format for a relation tuple is lt object gt lt relation gt lt user gt In this format object represents the object or resource being accessed relation represents the relationship between the user and the object and user represents the user or identity that is requesting access For example suppose we have a Zanzibar system that is managing access to a set of files Here are some example relation tuples that might be used to represent access control policies file owner alice file viewer bob document maintainers team membersIn this example we have three relation tuples file owner alice This tuple represents the fact that Alice is the owner of file file viewer bob This tuple represents the fact that Bob has permission to view file document maintainer team member This represents members of team who are maintainers of document As we understand how relation tuples are structured let s quickly create tuples in Postgresql Database Sample Implementation in PostgreSQL It s important to note that Zanzibar built on Google Spanner DB the authors explained that they organized the database schema using a table per object namespace approach However we re exploring the data model and not much care about the scalability and performance right now So we will store the relation tuples in a single PostgreSQL database deviating from the original Zanzibar paper s approach Here is our tuple table to represent relation tuples CREATE TABLE tuples object namespace text NOT NULL object id text NOT NULL object relation text NOT NULL subject namespace text subject id text NOT NULL subject relation text sets Lets bump sample relation tuples into that INSERT INTO tuples object namespace object id object relation subject namespace subject id subject relation VALUES doc owner user NULL doc parent org doc owner org member org member user NULL org admin user NULL org member user NULL Respectively this will create following relation tuples user is owner of document doc belongs to organization members of org are owners of doc user is admin in org user is member in org Zanzibar APIsThe Zanzibar API consists of five methods read write watch check and expand Those methods can be used to manage access control policies and check permissions for resources The read write and watch methods are used for interacting directly with the authorization data relation tuples while the check and expand methods are focused specifically on authorization For this article we will be implementing the check API which is essential for enforcing authorization when using Zanzibar We will also be using a sample dataset to test our authorization logic Check APIThe check API enables applications to verify whether a specific subject such as a user group or team members has the necessary permissions to perform a particular action on a resource In our case it would verify whether a logged in user is authorized to edit document X How does the check API evaluate access decisions Zanzibar stores information about resources users and relationships in a centralized database And when a user requests access to a resource Zanzibar uses that information stored in its database to evaluate the request and determine whether access should be granted By doing so Zanzibar ensures that access control decisions can be made quickly and efficiently even for large scale distributed systems Check RequestIn our example we specify that users can edit docs only if the user is an owner of the document In this context we can call our check API as follows and expect to get a boolean value indicating whether the given subject user X has the specified relation owner editor etc to the given object document Y check subject id object relation object namespace object id We ll use this function in our edit endpoint But let s first build the logic behind it as a store procedure where we defined our tuples table CREATE OR REPLACE FUNCTION check p subject id text p object relation text p object namespace text p object id text RETURNS boolean LANGUAGE plpgsqlAS DECLARE var r record var b boolean BEGIN FOR var r IN SELECT object namespace object id object relation subject namespace subject id subject relation FROM tuples WHERE object id p object id AND object namespace p object namespace AND object relation p object relation ORDER BY subject relation NULLS FIRST LOOP IF var r subject id p subject id THEN RETURN TRUE END IF IF var r subject namespace IS NOT NULL AND var r subject relation IS NOT NULL THEN EXECUTE SELECT check USING p subject id var r subject relation var r subject namespace var r subject id INTO var b IF var b TRUE THEN RETURN TRUE END IF END IF END LOOP RETURN FALSE END Voila we implemented Zanzibar s check API Here s a more detailed breakdown of what this store procedure exactly does Declaring VariablesI declared a record variable var r and a boolean variable “var b DECLARE var r record var b boolean var rThe record variable var r is used to hold the current row of the result set returned by the SQL query in the FOR loop The loop iterates over the rows returned by the query one at a time and the values of the columns in the current row are assigned to the fields of the record variable This allows the function to access the values of the columns in the current row using field references instead of column names var bThe boolean variable var b is used to store the result of the recursive call to the same function When the function makes a recursive call to itself it passes the new parameters and expects a boolean result The boolean result indicates whether the relation exists or not The result of the recursive call is stored in the var b variable and is checked later in the loop If the result is TRUE the loop terminates early and the function returns TRUE If none of the recursive calls return TRUE the loop completes and the function returns FALSE Retrieve all related relation tuplesFOR var r IN SELECT object namespace object id object relation subject namespace subject id subject relation FROM tuples WHERE object id p object id AND object namespace p object namespace AND object relation p object relation ORDER BY subject relation NULLS FIRST This part of the code is responsible for retrieving all rows from the tuples table where the object namespace object id and object relation columns match the corresponding parameters passed to the function The ORDER BY clause sorts the rows by the subject relation column in ascending order with null values coming first The loop continues until all rows in the result set have been processed or until a RETURN statement is encountered which causes the function to terminate early and return a value Recursive Search to Conclude Access CheckLOOP IF var r subject id p subject id THEN RETURN TRUE END IF IF var r subject namespace IS NOT NULL AND var r subject relation IS NOT NULL THEN EXECUTE SELECT check USING p subject id var r subject relation var r subject namespace var r subject id INTO var b IF var b TRUE THEN RETURN TRUE END IF END IF END LOOP RETURN FALSE The function then starts a loop that retrieves rows from the tuples table based on the input parameters The loop checks if the “subject id column of the retrieved row matches the input “p subject id parameter If it does the function returns TRUE immediately as the relation exists If the subject id column of the retrieved row does not match the input “p subject id parameter the function checks if the row contains a subject relation and subject namespace If it does the function makes a recursive call to itself with the subject relation subject namespace and “subject id columns of the retrieved row as input parameters The result of the recursive call is stored in the var b variable If the value of var b is TRUE the function returns TRUE immediately as the relation exists If none of the retrieved rows have a matching “subject id column or a recursive call returns TRUE the function returns FALSE as the relation does not exist That s the end of the breakdown now let s implement this in our documents id endpoint using sequelize query to call our check function Update a document by IDapp put documents id async req res gt const id parseInt req params id const document await Document findByPk id if document return res status send Document not found Check if the user is authorized to edit the document const result await sequelize query CALL check userId objectRelation objectNamespace objectId replacements userId req user id objectRelation owner objectNamespace doc objectId id if result return res status send Unauthorized const title content req body await document update title content res send document So in this call we re checking whether user X has an owner relation with doc Y If we look at the relation tuples let s remember them user is owner of document doc belongs to organization members of org are owners of doc user is admin in org user is member in org user and user can edit document because user has direct ownership relation with document and user is member of org which we stated members of the org are owners of doc What if we need to extend the requirements for editing a document Such as we can say that admins of an organization which document X belong can edit document X In that case we need to make additional calls For example we can check whether user admin in organization And whether document belongs to organization Respectively check admin organization check parent organization Of course we don t want to make a separate call for every authorization rule Ideally we want to be able to manage everything with a single call To achieve this we need to create an authorization model or policy and feed it into the Zanzibar engine This allows Zanzibar to search for the given action such as edit push delete etc and the relevant relationships Then it can check each relationship to see whether a given subject i e a user or user set is authorized to perform the action Modeling in ZanzibarZanzibar handles modeling with namespace configurations A namespace configuration in Zanzibar is a set of rules that define how resources within a specific namespace can be accessed Such as a collection of databases or a set of virtual machines A possible document modeling could be represented asname doc relation name owner relation name editor userset rewrite union child computed userset relation owner child tuple to userset tupleset relation parent computed userset relation admin relation name parent This namespace configuration has three relations owner editor and parent Users who are members of the owner relation have full control over the resources in the doc namespace The parent relation specifies that the organization doc belongs to The editor relation specifies that users who are members of the owner relation or members of the admin relation in the organization specified by the parent relation can edit resources in the doc namespace Modeling in PermifyAlthough I appreciate the approach behind namespace configurations I must admit that it can be inconvenient to model complex cases You may need to create multiple namespaces and configure them separately At Permify we have developed an authorization language that aims to simplify the modeling process for those complex cases Our approach allows for more flexibility and customization while still ensuring that access control decisions can be made quickly and efficiently The language allows to define arbitrary relations between users and objects such as owner editor in our example further you can also define roles like user types such as admin manager member etc Here is a quick demonstration of how modeling is structured in Permify Relationship based policies and centralized authorization management combination solves so much for the the teams to get market fast as well as for the organizations especially those with large datasets and multiple segmentation of hierarchies or groups teams ConclusionDuring our journey to build a Zanzibar inspired solution we have discovered that ReBAC with centralized management duo is the desired approach for many use cases and most of the teams we discussed want to benefit from it Yet implementing such a system is not an easy decision Adopting a Zanzibar like solution requires designing your data model around it which means significant refactoring and changes in approach if you are currently using legacy authorization solutions based on roles or attributes associated with users or user sets At that point seeing new Zanzibar solutions to ease those frictions makes us happy and help us to see different angles around Zanzibar If you are interested in learning more about Zanzibar or believe that it may be beneficial to your organization please don t hesitate to join our community on discord We welcome the opportunity to chat and share our insights 2023-04-06 16:26:30
海外TECH Engadget Tesla employees reportedly shared videos captured by cameras on customers' cars https://www.engadget.com/tesla-employees-reportedly-shared-videos-captured-by-cameras-on-customers-cars-165703126.html?src=rss Tesla employees reportedly shared videos captured by cameras on customers x carsSome Tesla workers shared sensitive photos and videos captured by the cameras on owners cars between each other for several years according to Reuters Former employees told the outlet that colleagues shared the images in group chats and one on one communications between and last year One such video showed a Tesla driving at high speed before hitting a child on a bike Reuters reported Other footage included things like a nude man walking toward a vehicle quot We could see them doing laundry and really intimate things We could see their kids quot one of the former employees said Workers are said to have sent each other videos taken inside Tesla owners garages too One clip reportedly showed a submersible white Lotus Esprit sub that appeared in the James Bond movie The Spy Who Loved Me As it happens Tesla CEO Elon Musk bought that vehicle a decade ago suggesting that his employees were circulating footage that a vehicle captured inside his garage The image sharing practice “was a breach of privacy to be honest quot one of the former employees said quot And I always joked that I would never buy a Tesla after seeing how they treated some of these people On its website Tesla says each new vehicle it builds is equipped with eight external cameras These support features such as Autopilot Smart Summon and Autopark They also enable the Sentry Mode surveillance system that captures footage of people approaching a parked Tesla and other seemingly suspicious activity The company states in its customer privacy notice that it designed the camera system to protect user privacy It says that even if owners opt in to share camera recordings with Tesla for quot fleet learning quot purposes quot camera recordings remain anonymous and are not linked to you or your vehicle quot unless it receives the footage due to a safety event such as a crash or an airbag deployment Even so one employee said it was possible for Tesla data labelers to see the location of captured footage on Google Maps Tesla does not have a communications department that can be reached for comment This article originally appeared on Engadget at 2023-04-06 16:57:03
海外TECH Engadget Stem Player pocket-sized remixer adds unreleased J Dilla tracks https://www.engadget.com/stem-player-pocket-sized-remixer-adds-unreleased-j-dilla-tracks-165245151.html?src=rss Stem Player pocket sized remixer adds unreleased J Dilla tracksThe puck shaped audio remixing tool Stem Player by Kano started its life as a collaboration with controversial musician Kanye West but it has expanded and partnered with the estate of deceased hip hop legend J Dilla Users will be able to remix and rearrange J Dilla beats via an exclusive catalog of content selected by the producer s mother Ma Dukes The songs added to Stem Player have never been officially released so your arrangement could end up being the de facto standard Unfortunately there aren t any tracks from iconic J Dilla albums like Donuts and Champion Sound The many legendary tracks he produced for other artists like De La Soul and A Tribe Called Quest are also not available on this platform Rights and all of that There are other musicians involved with this update Stem Player has announced some Flea and Salaam Remi tracks are available for remixing though J Dilla is the guest of honor To that end the collection even includes a discussion about his legacy led by his mother The company also announced it is working on a documentary about the producer and has released a green skin for the Stem Player as a tribute For the uninitiated the Stem Player is a puck shaped device with physical controls to remix and rearrange audio tracks In this context “stems refer to the basic tracks of a song so you can use the device to change various attributes of each stem such as volume This gadget handles the actual raw and unmixed tracks from the artist It does not use AI to separate each track after they are mixed The end result Better stems and more accurate controls Kano has severed ties with beleaguered rapper Kanye West but it has added Ghostface Killah to the roster prior to the J Dilla announcement It has also recently released a projector used to remix visuals The company has started crowdfunding to guarantee the release of future products including a DIY headphone building kit All J Dilla tracks are available now but you need a Stem Player The custom green skin costs on top of that nbsp This article originally appeared on Engadget at 2023-04-06 16:52:45
海外TECH Engadget The ASUS ROG Zephyrus Duo 16 gaming laptop is $800 off right now https://www.engadget.com/the-asus-rog-zephyrus-duo-16-gaming-laptop-is-800-off-right-now-161502193.html?src=rss The ASUS ROG Zephyrus Duo gaming laptop is off right nowIf you ve been looking to scoop up a new gaming laptop but a solitary screen doesn t quite cut the mustard you should perhaps consider the ASUS ROG Zephyrus Duo It s our current pick for the best gaming laptop with dual displays Best of all it s on sale right now One variant is available for which is a whopping off the regular price This configuration comes with a inch Hz ROG Nebula HDR QHD display It s powered by an AMD Ryzen HX CPU and NVIDIA GeForce RTX Ti GPU The GB of DDR RAM should help ensure you can play most games without too many hitches You ll have a decent volume of storage space for your games too as this ASUS ROG Zephyrus Duo has a TB SSD The internal specs aren t what make this gaming laptop stand out though It s that second screen that sits between the keyboard and the main display The inch ScreenPad Plus could be handy for productivity allowing you to keep an eye on certain apps while keeping most of your focus on more important tasks up top nbsp It might help you keep tabs on the news social media or a show you re watching while getting some work done Or you might use it to keep Discord open while you play games or pull up a walkthrough on YouTube if you get stuck Alternatively you could use it to monitor your viewership stats while you stream your gameplay We gave the ASUS ROG Zephyrus Duo a score of in our review which is certainly respectable The device isn t all sunshine and roses unfortunately We felt that it s fairly bulky with high pitched fans and an underwhelming battery life The touchpad which ASUS scuttled off to the right side of the keyboard is a bit awkward too Still for those hunting for a good deal on a dual screen laptop you won t find many better options elsewhere at the minute Follow EngadgetDeals on Twitter and subscribe to the Engadget Deals newsletter for the latest tech deals and buying advice This article originally appeared on Engadget at 2023-04-06 16:15:02
Cisco Cisco Blog Two decades of IoT innovation for a future of possibilities https://feedpress.me/link/23532/16060485/two-decades-of-iot-innovation-for-a-future-of-possibilities digitize 2023-04-06 16:54:09
Cisco Cisco Blog Deploying the Wi-Fi Network at Cisco Live EMEA 2023 https://feedpress.me/link/23532/16060427/deploying-the-wi-fi-network-at-cisco-live-emea-2023 Deploying the Wi Fi Network at Cisco Live EMEA This is a short story about the inside experience of deploying the network at Cisco Live EMEA from the point of view of a Cisco CX Wi FI engineer 2023-04-06 16:00:38
海外科学 NYT > Science Biden Administration to Curb Toxic Pollutants From Chemical Plants https://www.nytimes.com/2023/04/06/climate/biden-toxic-pollutants-chemical-plants.html Biden Administration to Curb Toxic Pollutants From Chemical PlantsThe rule would affect the majority of chemical manufacturers which have plants spread across the Gulf Coast the Ohio River Valley and in West Virginia 2023-04-06 16:07:32
海外科学 NYT > Science What You Need to Know About Turbulence on Airplanes https://www.nytimes.com/article/airplane-turbulence.html What You Need to Know About Turbulence on AirplanesRecent incidents with turbulence during air travel raise questions about this challenging weather phenomenon Here s what we know about it and how to stay safe 2023-04-06 16:36:02
金融 金融庁ホームページ 企業会計審議会総会 議事次第を公表しました。 https://www.fsa.go.jp/singi/singi_kigyou/siryou/kaikei/20230331.html 企業会計 2023-04-06 17:00:00
ニュース BBC News - Home Give me more power to sack officers - Met chief https://www.bbc.co.uk/news/uk-65203633?at_medium=RSS&at_campaign=KARANGA profile 2023-04-06 16:10:23
ニュース BBC News - Home Man murdered pregnant wife by pushing her off Arthur's Seat https://www.bbc.co.uk/news/uk-scotland-edinburgh-east-fife-65199907?at_medium=RSS&at_campaign=KARANGA september 2023-04-06 16:08:33
ニュース BBC News - Home Easter travel: Millions set for getaway with delays likely https://www.bbc.co.uk/news/uk-65206317?at_medium=RSS&at_campaign=KARANGA delays 2023-04-06 16:38:29
ニュース BBC News - Home Ex-Italy PM Silvio Berlusconi in intensive care https://www.bbc.co.uk/news/world-europe-65199641?at_medium=RSS&at_campaign=KARANGA milan 2023-04-06 16:42:44
ニュース BBC News - Home Nus Ghani: No action after Tory Islamophobia sacking probe https://www.bbc.co.uk/news/uk-politics-65202889?at_medium=RSS&at_campaign=KARANGA ghani 2023-04-06 16:48:02
ニュース BBC News - Home Katelan Coates: Extensive searches for missing Todmorden 14-year-old https://www.bbc.co.uk/news/uk-england-leeds-65205083?at_medium=RSS&at_campaign=KARANGA march 2023-04-06 16:01:34
ニュース BBC News - Home DP World Tour: LIV Golf players told to pay £100,000 over 'serious breaches' https://www.bbc.co.uk/sport/golf/65198669?at_medium=RSS&at_campaign=KARANGA DP World Tour LIV Golf players told to pay £ over x serious breaches x The DP World Tour wins its arbitration case against LIV Golf players after an arbitration panel ruled in its favour 2023-04-06 16:16:26
Azure Azure の更新情報 General availability: Read replicas for Azure Database for PostgreSQL Flexible Server https://azure.microsoft.com/ja-jp/updates/general-availability-read-replicas-for-azure-database-for-postgresql-flexible-server-3/ General availability Read replicas for Azure Database for PostgreSQL Flexible ServerThe Dlsv and Dldsv VM series provide GiBs per vCPU and can offer lower price points within the general purpose Azure Virtual Machines portfolio 2023-04-06 17:00:19

コメント

このブログの人気の投稿

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2020-12-01 09:41:49 RSSフィード2020-12-01 09:00 分まとめ(69件)