投稿時間:2023-04-08 08:24:49 RSSフィード2023-04-08 08:00 分まとめ(34件)

カテゴリー等	サイト名等	記事タイトル・トレンドワード等	リンクURL	頻出ワード・要約等/検索ボリューム	登録日
IT	ビジネス+IT 最新ニュース	ショート動画はビジネスの何を変えたのか？イーロンや豊田章男も使いこなす新・経営資源	https://www.sbbit.jp/article/cont1/111296?ref=rss	ショート動画はビジネスの何を変えたのかイーロンや豊田章男も使いこなす新・経営資源人々の情報消費において動画がメジャーになる中、「ショート動画」がグローバル規模で大躍進している。	2023-04-08 07:10:00
AWS	AWS	Getting Started with AWS Firewall Manager \| Amazon Web Services	https://www.youtube.com/watch?v=FGhKpPDBvXc	Getting Started with AWS Firewall Manager Amazon Web ServicesAWS Firewall Manager simplifies your administration and maintenance tasks across multiple accounts and resources for a variety of protections including AWS WAF AWS Shield Advanced Amazon VPC security groups AWS Network Firewall and Amazon Route Resolver DNS Firewall With Firewall Manager you set up your protections just once and the service automatically applies them across your accounts and resources even as you add new accounts and resources Learn more about Firewall Manager Subscribe More AWS videos More AWS events videos Do you have technical AWS questions Ask the community of experts on AWS re Post ABOUT AWSAmazon Web Services AWS is the world s most comprehensive and broadly adopted cloud platform offering over fully featured services from data centers globally Millions of customers ーincluding the fastest growing startups largest enterprises and leading government agencies ーare using AWS to lower costs become more agile and innovate faster AWSEdgeShorts AWS AmazonWebServices CloudComputing	2023-04-07 22:41:41
python	Pythonタグが付けられた新着投稿 - Qiita	【ChatGPT】AI先生と学ぶ！初心者向け Python プログラミングレッスン	https://qiita.com/key353/items/c6c31082c669d774453d	chatgpt	2023-04-08 07:11:29
海外TECH	DEV Community	Introducing AWS Lambda Response Streaming	https://dev.to/aws-builders/introducing-streaming-response-from-aws-lambda-511f	Introducing AWS Lambda Response StreamingToday AWS has announced support for Streaming Responses from Lambda Functions This long awaited capability helps developers stream responses from their functions to their users without necessarily waiting for the entire response to be finished It s especially useful for server side rendering commonly used by modern javascript frameworks Let s dive in EnablingTo enable Streaming Responses developers will have to modify their function code slightly Your handler will need to use a new decorator available in the Lambda runtime for Node or which wraps your handler Here s an example from the launch post exports handler awslambda streamifyResponse async event responseStream context gt responseStream setContentType “text plain responseStream write “Hello world responseStream end If you re familiar with Node s writable stream API then you ll recognize that this decorator implements one AWS suggests you use stream pipelines to write to the stream again here s the example from the launch post const pipeline require util promisify require stream pipeline const zlib require zlib const Readable require stream exports gzip awslambda streamifyResponse async event responseStream context gt As an example convert event to a readable stream const requestStream Readable from Buffer from JSON stringify event await pipeline requestStream zlib createGzip responseStream Apart from something like server side HTML rendering this feature also helps transmit media back to API callers Here s an example of a Lambda function rendering an image using response streaming Response streaming function which loads a large image const image awslambda streamifyResponse async event responseStream context gt responseStream setContentType image jpeg let result fs createReadStream large photo jpeg await pipeline result responseStream You can see the response streaming to the browser which looks like this Calling these functionsNext if you re going to call a function which issues a Streaming Response programmatically using the NodeJS AWS SDK you ll need to use v I ve written about this change extensively but most importantly for this feature it doesn t seem that the v SDK is supported at all So you ll need to upgrade before you can take advantage of Streaming Responses If you re looking to invoke a function using Streaming Responses with other languages it s also now supported using the AWS SDK for Java x and AWS SDKs for Go version and version I d hope Python s boto support is coming soon But wait one catchFinally developers can use this capability only with the newer Lambda Function URL integration Function URLs are one of several ways to trigger a Lambda Function via an HTTP request which I ve covered previously in another post This will be a bit limiting in terms of authentication mechanisms but most importantly it enables developers to stream responses larger than the previous MB limit afforded by AWS Lambda My takeIf you re using Lambda to serve media such as images videos or audio Streaming Responses will help immensely That s not been a core use case for me personally but I suspect this will be most leveraged by developers using Lambda to serve frontend applications using server side rendering For those users I think this launch is particularly exciting Ultimately Streaming Response for Lambda is an important step in bringing the capability of Lambda closer to what users can get in other traditional server ful compute environments It s an exciting new feature and I m looking forward to seeing the capabilities it unlocks for users Wrapping upAs always if you liked this post you can find more of my thoughts on my blog and on twitter	2023-04-07 22:23:49
海外TECH	DEV Community	Cracking the Frontend Interview, Part 2: HTML	https://dev.to/bybydev/cracking-the-frontend-interview-part-2-html-2bi3	Cracking the Frontend Interview Part HTMLJavaScript frameworks are getting powerful and convenient you can write a production website purely in JavaScript HTML is moving slowly to behind the scene and getting generated automatically it gets more and more underrated more developers ignore learning it seriously But I dare you to say bad about HTML or make it optional to learn in public you ll get snarled by experienced developers who might barely write a single raw HTML tag in years They will blame you for not adopting semantic markup and accessibility almost all the time So certainly it will hurt you in a frontend interview The HTML living standard actually defines a lot of things related to CSS JavaScript and Browser implementation It s quite confusing for me to separate them So if you don t see something HTML Parsing DOM Web APIs etc here there s a high chance it will be covered in following parts This post is not a how to on writing HTML or an exhausted list of HTML Q amp A for interviews I m sure there are tons of HTML references and style guides out there for you to learn HTML in depth This post only covers some very important topics related to HTML in a sense of what they are and why on earth we need to learn them StandardThere s a confusion in term of who controls the HTML standard between WC and WHATWG On May the WC announced that WHATWG would be the sole publisher of the HTML and DOM standards The current HTML Living Standard is developed by WHATWG which is made up of the major browser vendors Apple Google Mozilla and Microsoft HTML is the fifth and current major version of HTML and has following building blocks Semantic elements to describe your web content more preciselyFirst class embed audio and video multimedia elementsIntroduce secure and bigger client side data storageSupport D D graphics using canvas and svgBetter communication with server using Web Sockets or WebRTCMany more APIs to communicate with browser and hardwareEncourage accessibility for better support impaired audiencesDeprecated HTML presentation elements and encourage CSS for styling SemanticWhat is semantic markup It s simply to use the right elements for the right purposes HTML defined tons of semantics elements for you your job is to use as many of those as possible instead of polluting your site with div Who benefits most from this movement The robots Yes you can create a beautiful website without caring about semantics at all Those semantic elements were created to help automated tools like search engines web crawlers screen readers reading your sites easier Semantic markup also makes your life easier with readability when reading the source code debugging html engaging new projects or crawling the web No more creative unconventional class names like lt div class nav gt lt div class navbar gt lt div class nav bar gt lt div class my nav gt lt div class awesome nav gt or lt div class leave me alone gt nav to play the hero I developed many web crawlers personal use only years ago and I really hated sites with unpredictable HTML class naming conventions dynamic HTML tags or confusing site structures Thanks to semantic markup the life of crawler developer is getting better than ever MetadataI m talking about metadata on the lt head gt of HTML document this is an awesome place to put tons of useful things which will be easily consumed by automated tools They are not visible to end users but can be inspected by browser developer tools Meta tags for web browsers These are basic tags for browsers to know more about your site and render properly lt head gt lt specify character encoding for HTML document gt lt meta charset UTF gt lt control page s dimensions and scaling gt lt meta name viewport content width device width initial scale gt lt set base URL for all relative URLs gt lt base href target blank gt lt define style gt lt style gt body background color white h color red p color blue lt style gt lt load a stylesheet gt lt link rel stylesheet href style css gt lt define a client side script gt lt script gt function myFunction document getElementById demo innerHTML Hello JavaScript lt script gt lt HTML parsing is paused for normal script to be fetched and executed gt lt script src normal script js gt lt script gt lt async script is fetched asynchronously with HTML parsing async script is executed when ready and blocks HTML parsing gt lt script async src async script js gt lt script gt lt defer script is fetched asynchronously with HTML parsing defer script is executed when HTML parsing has finished gt lt script defer src defer script js gt lt script gt lt head gt Meta tags for search engines Search engine like Google is really smart and powerful besides server side rendering pages they claim they can crawl client side rendering ones It s often to use following tags to feed search engines some important info lt head gt lt title gt Harry Wilson lt title gt lt meta name description content Software Engineer gt lt meta name keywords content JavaScript Fullstack gt lt meta name author content Harry Wilson gt lt meta name robots content index follow gt lt link rel canonical href gt lt head gt Meta tags for social sharing It s quite easy and straightforward to have beautiful sharing cards on Facebook Twitter LinkedIn Pinterest Most of them support Open Graph protocol Use following tools to test social sharing Facebook Sharing Debugger Twitter Card Validator or metatags io lt head gt lt Facebook Open Graph gt lt meta property og type content website gt lt meta property og url content gt lt meta property og title content Harry Wilson gt lt meta property og description content Software Engineer gt lt meta property og image content gt lt meta property og site name content example gt lt Twitter also supports Open Graph gt lt meta name twitter card content summary large image gt lt meta name twitter image alt content Avatar gt lt meta name twitter site content example gt lt head gt And don t forget that you can put literally any custom metadata in the lt head gt for any purposes like analytics prompt to install apps on mobile browsers site verifications tracking you name it MicrodataMost of frontend developers don t use microdata which is a part in HTML standard that encourage webmasters adding more indicators on what content is all about You only want to do this when you want your site to be understood even more by search engines besides sematic markup and metadata on lt head gt lt div itemscope itemtype gt lt div itemprop name gt Spinal Tap lt div gt lt span itemprop description gt One of the loudest bands ever reunites for an unforgettable two day show lt span gt Event date lt time itemprop startDate datetime T gt May pm lt time gt lt div gt Major search engines use vocabularies from schema org along with following formats Microdata RDFa or JSON LD Search result pages are getting more complicated with carousels context based mobile rich snippets sponsored results direct plain text answers etc To maximize the chance of your site displayed in these results you have to feed search engines even more AccessibilityWeb accessibility is a practice encouraged by HTML standard to help people with disabilities consume the web It s an important movement on an effort of providing equal opportunity for everyone in this world This is made possible by authoring tools web content user agents assistive technologies and especially you frontend developers The Web Content Accessibility Guidelines is long and complicated I bet not many of you read that but I m sure you already do a good job if you re already following semantic markup Accessibility is underrated not just by newbie developers but also experienced ones The main reason is we don t see the direct benefits or immediate impact if we do follow it Why Simply because most of us are perfectly physically normal It s okay to ignore accessibility when you re developing almost no one use websites but should take great care of it when developing popular ones Making this as a habit from the start is even more awesome many people around the world will be very delighted with what you re doing Internationalization in English only website is fine for individuals and small companies but in is actually a huge deal when you re developing sites for popular international brands Knowing this is important when you re applying for big enterprises In ensures supports in different character sets writing directions different approaches to format and store data cultural problems and most importantly the design needs to be flexible enough to accommodate local needs To support in perfectly most frontend developers use JavaScript frameworks which support in Using only pure HTML to support in is extremely painful and not repeatable JavaScript frameworks will simplify many aspects Creating multiple language versions of your appDisplaying dates number percentages and currenciesPreparing text in component templates for translationHandling plural forms of wordsHandling alternative text Google AMPGoogle announced an open source HTML framework called AMP Accelerated Mobile Pages for better and faster experiences on the mobile web Google said that in This was created in an effort to play against Facebook Instant Articles AMP can be used in websites stories ads and emails with three main restricted building blocks You must use AMP custom HTML components amp something instead of standard HTML tags JavaScript usage is very restricted and all your site s resources will be served via AMP s CDN with many optimizations The main benefit of using AMP is to make your site appear on Google Top Story Carousel it s a powerful way to boost mobile SEO and CTR of news content This AMP is very controversial over the years people blame it for going against the open web and for Google s own sake You should think twice when adopting it and with your own risk because you re moving to the parallel world which has very unpleasant developer experiences HTML EmailWe use HTML emails everyday but we normally don t write it it is often already integrated by email clients or email marketing services It s good know how to write raw HTML emails because it might be useful in some startups when you re going to develop transactional email templates To support wide variety of not so innovative email clients you can only use a very small set of HTML CSS with a lot of limitations compares to web HTML It s safe to use static table based layouts simple inline CSS HTML tables and nested tables web safe fonts Cautious use with background images custom fonts and embedded CSS Typical email clients aren t as up to date as web browsers Simply stay away from JavaScript iframe Flash embedded audio video forms and div layering ConclusionsHTML is moving slowly to behind the scene With the rise of JavaScript frameworks it s hard to find frontend developers who write vanilla HTML these days HTML is still there for years to come but it s simple and stable to be generated automatically If you re using some bootstrap frameworks you ll see yourself rarely writing raw HTML tags in years You ll learn a lot when developing a vanilla website Maybe you don t have chances to do this in production But if you actually try it you ll realize that many JavaScript and CSS frameworks are doing way too much I don t say those complicated frameworks are bad because they cover many features that take you years to implement on your own I ignored many basic topics in this post explore more on your own to have an solid exhausted knowledge about HTML I recommend following resources to dive deeper on the quest for frontend job HTML Living StandardExhausted HTML referencesBasic overview of HTMLSome difficult HTML questionsCritical rendering path	2023-04-07 22:10:57
海外TECH	DEV Community	Working with Managed Workflows for Apache Airflow (MWAA) and Amazon Redshift	https://dev.to/aws/working-with-managed-workflows-for-apache-airflow-mwaa-and-amazon-redshift-40nd	Working with Managed Workflows for Apache Airflow MWAA and Amazon RedshiftI was recently looking at some Stack Overflow questions from the AWS Collective and saw a number of folk having questions about the integration between Amazon Redshift and Managed Workflows for Apache Airflow MWAA I thought I would put together a quick post that might help folk address what I saw were some of the common challenges There is some code that accompanies this post which you can find at the GitHub repository cdk mwaa redshift Pre requisites You will need the following to be able to use the code in this repo AWS CDK I was using and Node version An AWS Region where MWAA is availableSufficient privileges to deploy resources within your AWS AccountCOSTs I ran this for hours and the cost was so make sure if you do deploy this code that you delete remove all the resources created once you have finished to ensure you do not incur further costsWhich OperatorOne of the first things I have noticed is that there are a number of methods for orchestrating tasks to interact with Amazon Redshift Apache Airflow uses operators to simplify your life when working with downstream systems like Amazon Redshift Depending on what you are trying to achieve your first task is to identify which Apache Airflow operator you want to use Provider Packages Apache Airflow bundles up operators into Provider Packages Depending on the version of Apache Airflow you are using AND the Airflow provider packages you have defined in your requirements txt will dictate which operators and what versions of those you have The easiest way to determine which you have up and running is from the Apache Airflow UI selecting the Providers under the Admin tab Why is this important When working with Redshift depending on the version of Airflow and the version of the Amazon Provider package the Redshift operator classes change For example in earlier versions you would use from airflow providers amazon aws operators redshift import RedshiftSQLOperatorbut in newer versions you use from airflow providers amazon aws operators redshift sql import RedshiftSQLOperatorThis is something to look out for when working with these operators These are the four that I could see come up most often PythonOperator you can use boto to directly write Python code to interact with your Redshift clustersRedshiftSQLOperator the RedshiftSQLOperator operator uses an Airflow defined connection default default redshift connection and then allows you to define SQL statements to runRedshiftDataOperator the RedshiftDataOperator operator works using the Redshift Data API and connects using your AWS credentials default aws default RedshiftToSTransfer and SToRedshiftOperator these operators either load date into your Redshift clusters from an S bucket or export data from your Redshift clusters to a S bucketThese are the operators that I could think of if you are using something different then let me know and I will update this post to include other ways The rest of this post will be looking at using these What we will do is set up an Amazon Redshift cluster Managed Workflows for Apache Airflow environment and the necessary S buckets automate the uploading of the sample data into our S bucketscreate the tables in our Redshift database using Airflow operatorsuse Airflow operators to import the sample dataSetting up my test environmentTo make things easier I have put together a simple CDK app that will deploy a new VPC and into that VPC deploy a MWAA environment and an Amazon Redshift cluster The stack also deploys some sample DAGs which we will use The repo is here cdk mwaa redshift and we can deploy this as follows git clone You will need to modify the app py and update the AWS Account and AWS Regions as well as define unique S buckets for your installation The current app py contains examples which need to be changed or your deployment will fail cd cdk mwaa redshiftcdk deploy mwaa demo utilscdk deploy mwaa demo vpccdk deploy mwaa demo dev environmentcdk deploy mwaa demo redshiftYou will be prompted after each stack to confirm you are happy to deploy So after reviewing the security changes answer Y to continue After around minutes you should have everything setup RedshiftSQLOperatorWe now have a completely new setup with no sample data Our next step is to upload the sample data which we can find info on in the Redshift documentation pages We will use Airflow operators to automate these tasks First we will upload the files to our S bucket that was created when the Redshift cluster was setup in my default app py this S bucket is called mwaa redshift To do this we do not need a specific Redshift operator we will just use the PythonOperator to do this defining a function that does the file copying and uploading to our S bucketS BUCKET mwaa redshift DOWNLOAD FILE S FOLDER sampletickit def download zip sc boto client s indata requests get DOWNLOAD FILE n with zipfile ZipFile io BytesIO indata content as z zList z namelist print zList for i in zList print i zfiledata BytesIO z read i n sc put object Bucket S BUCKET Key S FOLDER i Body zfiledata dag DAG setup sample data dag default args default args description Setup Sample Data from Redshift documentation schedule interval None start date datetime catchup False files to s PythonOperator task id files to s python callable download zip dag dag files to s The next step is to create the tables as per the guide so that when we ingest the data it is imported into the right tables For this we are going to use the first Redshift operator RedshiftSQLOperatorWe will define a variable in our DAG that holds the sql we want to execute copied from the guide above and then we will use this operator to execute that query to create our tables sample data tables venue sql create table IF NOT EXISTS public venue venueid smallint not null distkey sortkey venuename varchar venuecity varchar venuestate char venueseats integer create sample venue tables RedshiftSQLOperator task id create sample venue tables sql sample data tables venue sql redshift conn id REDSHIFT CONNECTION ID dag dag And that is it We just need to create these for each of the tables we want to created and then we create the flow of the DAG to make these run in parallelfiles to s gt gt create sample users tablesfiles to s gt gt create sample category tablesfiles to s gt gt create sample venue tablesfiles to s gt gt create sample date tablesfiles to s gt gt create sample listing tablesfiles to s gt gt create sample sales tablesfiles to s gt gt create sample event tablesYou can see the complete DAG in the repo hereBefore we run this we need to define how this operator is going to connect with our Redshift cluster this is the redshift conn id value that we set in the DAG but what we need to create via the Apache Airflow Connections From the ADMIN tab select CONNECTIONS and then click on the to add a new connection In my setup I used the following values Connection Id I used default redshift connection you can use whatever you want but make sure it matches what you set in your operator within the DAGConnection Type Amazon RedshiftExtras here I supplied a json string with the Redshift cluster I want to connect to iam true cluster identifier mwaa redshift cluster port region eu west db user awsuser database mwaa If you need to connect to different Redshift clusters you would create additional connections this way with unique Connection Ids and then reference that Id within your operator Once we create this we can trigger our DAG to run With luck you should find that you now have the sample data in your S bucket and seven tables created Well done now it is time to import that into our Redshift tables RedshiftToSTransfer and SToRedshiftOperatorSToRedshiftOperatorNow that we have our data in our Amazon S buckets the next step is to show how we can ingest data between Amazon S and Amazon Redshift When we look at the guide on how to set up the sample data manually we can see it provides us with all the necessary Redshift SQL commands We will be using those and here is an example of one of those Whilst most are the same some do use slightly different syntax which is typical in real life use cases copy users from s lt myBucket gt tickit allusers pipe txt iam role defaultdelimiter region lt aws region gt So let us put together a DAG using the SToRedshiftOperator operator Example of how to use SToRedshiftOperatorfrom datetime import datetime timedeltafrom airflow import DAGfrom airflow providers amazon aws transfers s to redshift import SToRedshiftOperator Replace these with your own valuesREDSHIFT CONNECTION ID default redshift connection S BUCKET mwaa redshift REDSHIFT SCHEMA mwaa default args owner airflow depends on past False email on failure False email on retry False retries retry delay timedelta minutes dag DAG s to redshift dag default args default args description Load CSV from S to Redshift schedule interval None start date datetime catchup False load s to redshift SToRedshiftOperator task id load s to redshift s bucket S BUCKET s key tickit allevents pipe txt schema REDSHIFT SCHEMA table event copy options CSV DELIMITER redshift conn id REDSHIFT CONNECTION ID dag dag load s to redshiftWhen we now trigger this DAG we then get output similar to the following UTC s to redshift py INFO Executing COPY command UTC base py INFO Using connection ID default redshift connection for task execution UTC sql py INFO Running statement COPY mwaa public event FROM s mwaa redshift tickit allevents pipe txt credentials aws access key id hidden aws secret access key token CSV DELIMITER FILLRECORD IGNOREHEADER parameters None UTC s to redshift py INFO COPY command complete And when we check our Redshift cluster we can now see we have the event data in the database Awesome we have used the SToRedshiftOperator to ingest the data We now need to do this for all the remaining files We can tweak the import options by defining copy options following standard Redshift configuration values For a complete reference of Redshift options go to the airflow providers amazon aws transfers s to redshift documentation page Check out the repo for the complete DAG which does this You will notice that for each operation we use slightly different configurations which map to the data we are ingesting RedshiftToSTransferNow that we have some data in our Redshift cluster we will use the next operator RedshiftToSTransfer to export this to an S bucket that we have provided Redshift with permissions to access We first define some variables for our Redshift database and where we want to export the data to REDSHIFT CONNECTION ID default redshift connection S BUCKET mwaa redshift S EXPORT FOLDER export REDSHIFT SCHEMA mwaa REDSHIFT TABLE public sales and then it is as simple as using the operator as follows defining the target export S bucket and key folder and then selecting the database and table we want to export transfer redshift to s RedshiftToSOperator task id transfer redshift to s redshift conn id REDSHIFT CONNECTION ID s bucket S BUCKET s key S EXPORT FOLDER schema REDSHIFT SCHEMA table REDSHIFT TABLE dag dag We can then trigger this DAG and we should now see that we have a new folder called export with the contents of the Redshift table exported We can tweak the export options by defining unload options following standard Redshift configuration values For a complete reference of Redshift options check out the airflow providers amazon aws transfers redshift to s documentation page You can see the complete DAG here RedshiftDataOperatorWhen we created our Redshift tables we used the RedshiftSQLOperator operator We can use a different operator to achieve the same results So what is different I can hear you all shouting into your screens The RedshiftDataOperator uses the redshift data api to achieve the same results using the AWS API When you use the RedshiftSQLOperator you are connecting via the wire protocol When using this operator we do not need to define the redshift conn id connection as it will use the default AWS one We do need however to define some variables which our operator will use to know which Redshift cluster database and user we want to use REDSHIFT CLUSTER mwaa redshift cluster REDSHIFT USER awsuser REDSHIFT SCHEMA mwaa POLL INTERVAL We define a variable to hold the SQL we want to run in this case to create the Redshift viewcreate view sql create view loadview as select distinct tbl trim name as table name query starttime trim filename as input line number colname err code trim err reason as reasonfrom stl load errors sl stv tbl perm spwhere sl tbl sp id and then we create our taskcreate redshift tblshting view RedshiftDataOperator task id create redshift tblshting view cluster identifier REDSHIFT CLUSTER database REDSHIFT SCHEMA db user REDSHIFT USER sql create view sql poll interval POLL INTERVAL wait for completion True dag dag We can now view from the Redshift query editor I am using the nice v version the complete Redshift sample data now deployed You can see the complete DAG here PythonOperatorThe final method is creating some Python code within your DAG and then running this code via the PythonOperator operator Why might you do this The only use case I can think of is perhaps where what you are trying to achieve cannot be done by the previous operators Before using the PythonOperatorBefore proceeding with using the PythonOperator I would encourage you to consider this If you do have something that is currently not achievable with the operators listed above go to the open source project and discuss within the Airflow slack channel after which you can raise an issue for a feature request This is the first step in making new functionality within these operators so think about whether this makes sense for you Why should you consider doing this Well if you can push the functionality you need upstream then this is less code for you to have to worry about and maintain You would or rather should still play your part and contribute upstream but the force multiplier effect of community will help you reduce the burdens of managing that code Something to think about With that out of the way how would you use the PythonOperator to work with Redshift You would create a function within your DAG that has the logic for what you are looking to do and then invoke via the operator In this example DAG which I put together a few years back before I had discovered the Redshift operators you can see how I created tables in Redshift def create redshift table rsd boto client redshift data resp rsd execute statement ClusterIdentifier redshift cluster Database redshift db DbUser redshift dbuser Sql CREATE TABLE IF NOT EXISTS redshift table name title character varying rating int print resp return OK create redshift table if not exists PythonOperator task id create redshift table if not exists python callable create redshift table dag dag In the same DAG I also used this approach to mimic the RedshiftToSTransfer operatordef s to redshift kwargs ti kwargs task instance queryId ti xcom pull key return value task ids join athena tables athenaKey s s bucket name athena results join athena tables queryId clean csv sqlQuery copy redshift table name from athenaKey iam role redshift iam arn CSV IGNOREHEADER rsd boto client redshift data resp rsd execute statement ClusterIdentifier redshift cluster Database redshift db SecretArn redshift secret arn Sql sqlQuery return OK transfer to redshift PythonOperator task id transfer to redshift python callable s to redshift provide context True dag dag As you can see using the Redshift operators is simpler and means I have much less work to do Diving deeper with the RedshiftSQLOperatorIn this blog post we have only scratched the surface of what you can do with Apache Airflow and Amazon Redshift Before we finish I want to share a few more advanced topics that worth knowing about and will give you more confidence in how you orchestrate your Redshift tasks using Apache Airflow Setting up Redshift ClustersYou can actually setup and delete new Redshift clusters using Apache Airflow We can see in the example dags of a DAG that does a complete setup and delete of a Redshift cluster There are a few things to think about however First you should not store the password in the DAG itself as this can be seen in plain sight by a larger number of people than is needed I would encourage you to either store this as a Variable within the Apache Airflow UI or better still integrate AWS Secrets Manager and then grab the password as a variable that way Integration with AWS Secrets ManagerNow some of you might have gone down the path of integrating MWAA with AWS Secrets Manager for your Variables and Connections If you have done this rather than creating the connections via the Apache Airflow UI via the ADMIN gt CONNECTION menu you need to create these in AWS Secrets Manager First you need to create your AWS Secret that will contain the JSON string for your connection details To make this easier you can use a small piece of Python code to generate this for your by changing the values for your setup import urllib parseconn type redshift host Specify the Amazon Redshift cluster endpoint login Specify the username to use for authentication with Amazon Redshift password Specify the password to use for authentication with Amazon Redshift role arn urllib parse quote plus YOUR EXECUTION ROLE ARN region YOUR REGIONconn string role arn amp region format conn type login password host role arn region print conn string Running this will generate a string and you need to copy this and use this when creating your AWS Secret Here is an example of doing this via the cliaws secretsmanager create secret name airflow connections default redshift connection description Apache Airflow to Redshift Cluster secret string paste value here region your region Now when Apache Airflow goes to find the default redshift connection it will grab the details via AWS Secrets ManagerThe second is to ensure that you lock down access to your Redshift cluster to only those clients that need it In the example access is wide open IpRanges CidrIp Description Test description which is fine for testing but make sure you lock this down if you plan to use this for real use cases The final thing to bear in mind is that depending on what you plan on doing you will need to make sure that the MWAA execution role has the right level of permissions In the example CDK stack that is part of this tutorial IAM privileges are scoped down to just the cluster name and I did not add any of the DELETE actions You should not provide broad access so make sure you define the appropriate least privilege policies based on what you might want to do Connecting to your Redshift ClusterThe next topic is actually how your MWAA environment connects to your Redshift cluster How you connect MWAA to your Redshift Clusters depends on a few things Which operator you are usingWhere your Redshift and Apache Airflow environments are locatedWhen you are using the RedshiftSQLOperator RedshiftToSTransfer and SToRedshiftOperator you are connecting via the wire protocol If you are using the RedshiftDataOperator or the PythonOperator via boto you are going to be accessing Redshift via the AWS API This is important because it will determine how Apache Airflow knows how to connect with the Redshift cluster you want to connect to RedshiftSQLOperator RedshiftToSTransfer and SToRedshiftOperator use an Airflow Connection document to define how to connect to your Redshift cluster and this is configured within the actual operator itself You can find more details on this by checking out the official documentation page hereA common problem with not being able to connect to Redshift clusters from Apache Airflow is that the security groups have not been changed to include a new inbound rule In a situation where both the Apache Airflow and Redshift clusters are in the same VPC the simplest scenario this should be as simple as ensuring that the Apache Airflow security group is added as a source for the Redshift security group Inbound rule In the following screenshot we can see what the Redshift cluster security looks like and how it includes an inbound rule for our Apache Airflow environment Where these are in different VPCs you will need to configure this with something like VPC peering this is outside the scope of this post but let me know if this is something you would like me to cover If you are still struggling to connect then check out this knowledge article which might help provide more info Using XComs to grab resultsThe Redshift operators will generate automatically XComs output when they run However I have found that this is dependant on the version of the Redshift operators you are using so make sure you are using the latest up to date ones For example I tried the same DAG using version and and I got no Xcoms output when running a query using the operator However when I upgraded to it did generate automatically the query ID In this simple example DAG I am running a query which will count some of the rows in one of the tables we have uploaded I want to run this query and then ensure that the results are returned in Xcoms I might want to pass this output of this to the next task for example query sql with x as select sellerid count as rows count from public listing where sellerid is not null group by sellerid select count as id count count case when rows count gt then end as duplicate id count sum rows count as total row count sum case when rows count gt then rows count else end as duplicate row count from x query sql RedshiftDataOperator task id query sql cluster identifier REDSHIFT CLUSTER database REDSHIFT SCHEMA db user REDSHIFT USER sql query sql poll interval POLL INTERVAL wait for completion True dag dag When I run this I can see that when I check XComs I get the Redshift query Id I now need to grab this so I create a quick function and then call it via the PythonOperatordef get results kwargs print Getting Results ti kwargs task instance runqueryoutput ti xcom pull key return value task ids query sql client boto client redshift data region name eu west response client get statement result Id runqueryoutput query data response Records print query data return query dataget query results PythonOperator task id get query results python callable get results dag dag And that is it all I need The key piece here is to ensure that the task ids is set to the correct task otherwise it will not return the correct value ti kwargs task instance runqueryoutput ti xcom pull key return value task ids query sql When I run this DAG I can now see that when I check on the XComs on the Apache Airflow UI I can see the output I can also see the output by checking the Task log UTC logging mixin py INFO Getting Results UTC logging mixin py INFO longValue longValue longValue longValue UTC python py INFO Done Returned value was longValue longValue longValue longValue ConclusionIn this short post we have explored a number of ways in which you can interact with Apache Airflow and Amazon Redshift I hope this will help you understand what your options are and help get you started If you have found this blog post helpful please give me some feedback by completing this very short survey hereBefore you go however make sure you check out the next section and delete remove all the resources you have just setup Cleaning upTo remove all the resources deployed use the following command cd cdk mwaa redshiftcdk destroy mwaa demo redshiftcdk destroy mwaa demo dev environmentcdk destroy mwaa demo vpcYou will need to manually delete the S buckets TroubleshootingWhat to do when it all goes horribly wrong In this section I just wanted to share some of the error I found when I was putting this blog together in the hope that if you get the same errors this post can help you Connection timeoutsWhen I thought everything had been setup correctly when triggering the DAG I got an extended pause and then an error UTC taskinstance py ERROR Task failed with exceptionTraceback most recent call last File usr local airflow local lib python site packages redshift connector core py line in init self usock connect host port TimeoutError Errno Connection timed outDuring handling of the above exception another exception occurred It turned out that the issue was that my MWAA Security Group had not been added to the Redshift Security Group as an Inbound rule to access Redshift on port Once I added this this resolved my problem In the CDK stack that accompanies this post this is automatically configured for you Errors importing listing pipe txtWhilst most of the import worked fine the listing pipe txt stubbornly refused to ingest The error in the Airflow task was as follows redshift connector error ProgrammingError S ERROR C XX M Load into table listing failed Check stl load errors system table for details F src pg src backend commands commands copy c L R CheckMaxRowError I found out that you could create a view within Redshift that helps you troubleshoot these kinds of issues create view loadview as select distinct tbl trim name as table name query starttime trim filename as input line number colname err code trim err reason as reasonfrom stl load errors sl stv tbl perm spwhere sl tbl sp id I was able to see that one line had an empty line When I checked the file sure enough there was a blank line I then adjusted the DAG as follows using the MAXERROR parameterlisting load s to redshift SToRedshiftOperator task id listing load s to redshift s bucket S BUCKET s key tickit listings pipe txt schema REDSHIFT SCHEMA table public listing copy options CSV DELIMITER MAXERROR IGNOREHEADER redshift conn id REDSHIFT CONNECTION ID dag dag	2023-04-07 22:05:18
海外TECH	CodeProject Latest Articles	Return Coefficients of the Linear Regression Model Using MLPack C++	https://www.codeproject.com/Tips/5358233/Return-Coefficients-of-the-Linear-Regression-Model	mlpack	2023-04-07 22:05:00
金融	金融総合：経済レポート一覧	FX Daily（4月6日）～弱い米新規失業保険申請件数を受けて神経質な値動きに	http://www3.keizaireport.com/report.php/RID/533224/?rss	fxdaily	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	株価・PBRなどの市場評価の分析を求める東証の開示要請～株価は市場で決まるものだが、踏み込んだ分析が必要になる：金融・証券市場・資金調達	http://www3.keizaireport.com/report.php/RID/533227/?rss	大和総研	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	米銀行不安の次なる論点は？～銀行不安は決算待ち。市場への波及と政策対応の不透明さが不安要素：米国	http://www3.keizaireport.com/report.php/RID/533229/?rss	不安要素	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	皮肉にも黒田総裁任期中の平均を大幅に上振れる賃金 YCCは修正に：経済の舞台裏	http://www3.keizaireport.com/report.php/RID/533230/?rss	第一生命経済研究所	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	米国株式ブーム、収束？～2023年３月の投信動向：研究員の眼	http://www3.keizaireport.com/report.php/RID/533235/?rss	米国	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	欧州大手保険グループの2022年末SCR比率の状況について（３）～ソルベンシーIIに基づく数値結果報告（資本取引等）：保険・年金フォーカス	http://www3.keizaireport.com/report.php/RID/533237/?rss	資本	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	投信マーケット概況 2023年4月号（2023年3月末基準）	http://www3.keizaireport.com/report.php/RID/533257/?rss	発表	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	DC投信マーケット概況 2023年4月号（2023年3月末基準）	http://www3.keizaireport.com/report.php/RID/533258/?rss	発表	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	金融庁「事務ガイドライン」（暗号資産交換業者関係）の改正～各種トークンの暗号資産該当性に関する解釈の明確化等	http://www3.keizaireport.com/report.php/RID/533261/?rss	pwcjapan	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	トランザクション・メディア・ネットワークス（東証グロース）～決済事業者と加盟店をつなぐキャッシュレス決済サービスをワンストップで提供。クレジットカードや電子マネー、QR・バーコードなど多様な決済サービスに対応：アナリストレポート	http://www3.keizaireport.com/report.php/RID/533266/?rss	決済サービス	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	トヨクモ（東証グロース）～安否確認サービスやkintone連携サービス等の法人向けクラウドサービスを展開。既存のクラウドサービスの拡大に加え、エンタープライズ向けサービスも志向：アナリストレポート	http://www3.keizaireport.com/report.php/RID/533267/?rss	kintone	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	スタメン（東証グロース）～社内の情報共有を支援するエンゲージメント経営プラットフォーム事業を展開。提供するプラットフォーム「TUNAG」の利用企業数の増加により業績拡大：アナリストレポート	http://www3.keizaireport.com/report.php/RID/533268/?rss	tunag	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	ＳＴＩフードホールディングス（東証スタンダード）～セブン‐イレブン向けが中心の研究開発型の水産食品メーカー。注目点は引き続き、原材料価格の動向と遅延している滋賀工場の立ち上げ：アナリストレポート	http://www3.keizaireport.com/report.php/RID/533269/?rss	研究開発	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	FX Weekly (2023年4月7日号)～来週の為替相場見通し（1）ドル円：植田日銀が始動する	http://www3.keizaireport.com/report.php/RID/533271/?rss	fxweekly	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	3月危機の4つの予兆～流動性リスクの広がりとソルベンシー・リスクに警戒：Mizuho RT EXPRESS	http://www3.keizaireport.com/report.php/RID/533273/?rss	mizuhortexpress	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	金研設立40周年記念対談「イノベーションセンターとしての金研」第2回　30年前の電子マネー研究秘話：金研ニュースレター	http://www3.keizaireport.com/report.php/RID/533274/?rss	日本銀行金融研究所	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	内外経済とマーケットの注目点（2023/4/7）～米国市場では景気悪化懸念とインフレ減速期待が入り交じる可能性：金融・証券市場・資金調達	http://www3.keizaireport.com/report.php/RID/533275/?rss	大和総研	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	プライベート・アセットとは？【アニメーション】	http://www3.keizaireport.com/report.php/RID/533280/?rss	発表	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	ポイントで無理なく貯蓄＆投資	http://www3.keizaireport.com/report.php/RID/533285/?rss	日本fp協会	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	日本株日経平均株価の5つのポイント（2023-4）	http://www3.keizaireport.com/report.php/RID/533287/?rss	日経平均株価	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	【注目検索キーワード】木造建築物	http://search.keizaireport.com/search.php/-/keyword=木造建築物/?rss	木造建築	2023-04-08 00:00:00
金融	金融総合：経済レポート一覧	【お薦め書籍】1300万件のクチコミでわかった超優良企業	https://www.amazon.co.jp/exec/obidos/ASIN/4492534628/keizaireport-22/	転職	2023-04-08 00:00:00
ニュース	BBC News - Home	Masters play suspended as trees fall close to crowd	https://www.bbc.co.uk/sport/av/golf/65217052?at_medium=RSS&at_campaign=KARANGA	augusta	2023-04-07 22:02:59
ニュース	BBC News - Home	Supreme Court's Clarence Thomas defends luxury trips	https://www.bbc.co.uk/news/world-us-canada-65215407?at_medium=RSS&at_campaign=KARANGA	thomas	2023-04-07 22:23:42
ニュース	BBC News - Home	Masters 2023: Rory McIlroy set to miss cut as Brooks Koepka leads at Augusta	https://www.bbc.co.uk/sport/golf/65216791?at_medium=RSS&at_campaign=KARANGA	Masters Rory McIlroy set to miss cut as Brooks Koepka leads at AugustaRory McIlroy s dreams of landing the Masters look to be over this year after a stormy day at Augusta saw fans avoid injury from falling trees	2023-04-07 22:27:48
ニュース	BBC News - Home	Heineken Champions Cup: Leinster 55-24 Leicester Tigers - Leinster into semi-finals	https://www.bbc.co.uk/sport/rugby-union/65214762?at_medium=RSS&at_campaign=KARANGA	Heineken Champions Cup Leinster Leicester Tigers Leinster into semi finalsLeinster score seven tries against Leicester to become the first side into this season s Heineken Champions Cup semi finals	2023-04-07 22:31:56
ビジネス	東洋経済オンライン	なか卯､卵高騰でも｢親子丼40円値下げ｣のカラクリ経営への影響は？｢逆張り戦略｣の裏を読み解く \| 外食 \| 東洋経済オンライン	https://toyokeizai.net/articles/-/665013?utm_source=rss&utm_medium=http&utm_campaign=link_back	東洋経済オンライン	2023-04-08 08:00:00

このブログを検索

IT音痴アラフィフおやじのストック記事倉庫

投稿時間:2023-04-08 08:24:49 RSSフィード2023-04-08 08:00 分まとめ(34件)

コメント

コメントを投稿

このブログの人気の投稿

投稿時間:2021-06-17 22:08:45 RSSフィード2021-06-17 22:00 分まとめ(2089件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)