python |
Pythonタグが付けられた新着投稿 - Qiita |
カスタムモデルの機械学習のプロジェクトのリポジトリを管理する |
https://qiita.com/nonbiri15/items/b1eb47b7ef96a3dd5854
|
機械学習 |
2023-03-19 20:48:13 |
python |
Pythonタグが付けられた新着投稿 - Qiita |
Alpaca-loraを日本語タスクでファインチューニングする |
https://qiita.com/toshi_456/items/280efc31950ddb083286
|
alpacalora |
2023-03-19 20:47:21 |
Ruby |
Rubyタグが付けられた新着投稿 - Qiita |
記事に「いいね」機能① |
https://qiita.com/RikutoMatsumoto/items/ea83203f0048e74f35d9
|
likes |
2023-03-19 20:52:50 |
Docker |
dockerタグが付けられた新着投稿 - Qiita |
RSpec導入と基本操作 |
https://qiita.com/Krtech/items/d3bb9b6192efe5159ff4
|
osventuradockerrubyprai |
2023-03-19 20:03:01 |
golang |
Goタグが付けられた新着投稿 - Qiita |
【GitHub Actions】【Go】Goのテストに入門するついでにプルリクのテストの自動化と自動通知をやってみた |
https://qiita.com/Fuses-Garage/items/21a8e0a5254a47e4df71
|
githubactions |
2023-03-19 20:27:09 |
Git |
Gitタグが付けられた新着投稿 - Qiita |
カスタムモデルの機械学習のプロジェクトのリポジトリを管理する |
https://qiita.com/nonbiri15/items/b1eb47b7ef96a3dd5854
|
機械学習 |
2023-03-19 20:48:13 |
Git |
Gitタグが付けられた新着投稿 - Qiita |
GitHubのSSH設定について(秘密鍵と公開鍵) |
https://qiita.com/ikura_ooo/items/12a6f474b1b80122212d
|
github |
2023-03-19 20:14:24 |
Ruby |
Railsタグが付けられた新着投稿 - Qiita |
記事に「いいね」機能① |
https://qiita.com/RikutoMatsumoto/items/ea83203f0048e74f35d9
|
likes |
2023-03-19 20:52:50 |
Ruby |
Railsタグが付けられた新着投稿 - Qiita |
RSpec導入と基本操作 |
https://qiita.com/Krtech/items/d3bb9b6192efe5159ff4
|
osventuradockerrubyprai |
2023-03-19 20:03:01 |
海外TECH |
DEV Community |
An Introduction to AWS Batch |
https://dev.to/aws-builders/an-introduction-to-aws-batch-2pei
|
An Introduction to AWS BatchAWS Batch is a fully managed service that helps us developers run batch computing workloads on the cloud The goal of this service is to effectively provision infrastructure for batch jobs submitted by us while we can focus on writing the code for dealing with business constraints Batch jobs running on AWS are essentially Docker containers that can be executed on different environments AWS Batch supports job queues deployed on EC instances on ECS clusters with Fargate and on Amazon EKS Elastic Kubernetes Service Regardless of what we choose for the basis of our infrastructure the provisioning of the necessary services and orchestration of the jobs is managed by AWS Components of AWS BatchAlthough one of the selling points of AWS Batch is to simplify batch computing on the cloud it has a bunch of components each requiring its own configuration The components required for a job executing on AWS Batch service are the following JobsJobs are Docker containers wrapping units of work which we submit to an AWS Batch queue Jobs can have names and they can receive parameters from their job definition Job DefinitionsA job definition specifies how a job should run Jobs definitions can have the followings an IAM role to provide access to other AWS services information about the memory and CPU requirements of the job other properties required for the job such as environment variables container properties and mount points for extra storage Job QueuesJobs are submitted to job queues The role of a job queue is to schedule jobs and execute them on compute environments Jobs can have a priority based on which they can be scheduled to run on multiple different compute environments The job queue itself can decide which job to be executed first on which compute environment Compute EnvironmentsCompute environments are essentially ECS clusters They contain the Amazon ECS container instances used for the containerized batch jobs We can have managed or unmanaged compute environments Managed compute environments AWS batch decides the capacity and the EC instance type required for the job in case we decide to run our jobs on EC Alternatively we can use Fargate environment which will run our containerized batch job on instances entirely hidden from us and fully managed by AWS Unmanaged compute environments we manage our own compute resources It requires that our compute environments use an AMI that meets the AWS ECS required AMI specifications Multi node jobs and GPU jobsAWS Batch supports multi node parallel jobs that span on multiple EC instances They can be used for parallel data processing high performance computing applications and for training machine learning models Multi node jobs can run only on managed compute environments In addition to multi node jobs we can enhance the underlying EC instances with graphics cards GPUs This can be useful for operations relying on parallel processing such as deep learning AWS Batch When to use it AWS Batch is recommended for any task which requires a lot of time memory computing power to run This can be a vague statement so let s see some examples of use cases for AWS Batch High performance computing tasks that require a lot of computing power such as running usage analytics tasks on a huge amount of data automatic content rendering transcoding etc Machine Learning as we ve seen before AWS Batch supports multi node jobs and GPU powered jobs which can be essential for training ML modelsETL we can use AWS Batch for ETL extract transform and load tasksFor any other task which may take up a lot of time hours days While these use cases may sound cool I suggest having caution before deciding if AWS Batch is the right choice for us AWS offers a bunch of other products configured for specialized use cases Let s walk through a few of these AWS Batch vs AWS Glue Amazon EMRA while above it was mentioned that AWS Batch can be used for ETL jobs While this is true we may want to step back and take a look at another service such as AWS Glue AWS Glue is a fully managed solution developed specifically for ETL jobs It is a serverless option offering a bunch of choices for data preparation data integration and ingestion into several other services It relies on Apache Spark Similarly Amazon EMR is also an ETL solution for petabyte scale data processing relying on open source frameworks such as Apache Spark Apache Hive and Presto My recommendation would be to use Glue EMR if we are comfortable with the technologies they rely on If we want to have something custom built by ourselves we can stick to AWS Batch AWS Batch vs SageMakerWe ve also seen that AWS Batch can be used for machine learning Again while this is true it is a crude way of doing machine learning AWS offers SageMaker for Machine Learning a data science SageMaker can run its own jobs that can be enhanced by GPU computing power While SageMaker is a one stop shop for everything related to machine learning AWS Batch is an offering for executing long running tasks If we have a machine learning model implemented but we just need the computing power to do the training we can use AWS Batch other than this probably SageMaker would make way more sense for everything ML related AWS Batch vs AWS LambdaAWS Lambda can also be an alternative for AWS Batch jobs For certain generic tasks a simple Lambda function can be more appropriate than having a fully fledged batch job We can consider using a Lambda Function when the task is not that compute intensive AWS Lambda can have up to vCPU cores and up to GB of RAM we know that our task would be able to be finished in minutes If we can adhere to these Lambda limitations I strongly suggest using Lambda instead of AWS Batch Lambda is considerably easier to set up and it has way fewer moving parts We can simply focus on the implementation details rather than dealing with the infrastructure Building an AWS Batch JobIn the upcoming sections we will put all things together and we will build an AWS Batch job from the scratch For the sake of this exercise let s assume we have a movie renting website and we would want to present movie information with ratings from critics to our customers The purpose of a batch job will be to import a set of movie ratings at certain intervals into a DynamoDB table For the movies dataset we will use one from Kaggle which I had to download first and upload it to an S bucket Kaggle limitation Usually if we are running a similar service in production we will pay for a certain provider which will expose a dataset for us Since Kaggle does not offer an easy way for automatic downloads I had to save the dataset first into an S bucket Also one may question the usage of a batch job considering the fact the data size might not be that big A Lambda function may be sufficient to accomplish the same goal While this is true for the sake of this exercise we will stick to batch A simplified architectural diagram of what we would want to accomplish can be seen here Creating a batch job requires the provisioning of several of its components To make this exercise redoable we will use Terraform for the infrastructure The upcoming steps can be accomplished from AWS console as well or with the usage of other IaC tools such as CDK Terraform is mainly a preference of mine Compute EnvironmentThe first component of a batch job we will create will be the compute environment Our batch job will be a managed job running on AWS Fargate We can write the IaC code for the compute environment as follows resource aws batch compute environment compute environment compute environment name var module name compute resources max vcpus security group ids aws security group sg id subnets aws subnet private subnet id type FARGATE service role aws iam role service role arn type MANAGED depends on aws iam role policy attachment service role attachment We can notice in the resource definition that it requires a few other resources to be present First the compute environment needs a service role According to the Terraform documentation the service role allows AWS Batch to make calls to other AWS services on your behalf With all respect to the people who wrote the documentation for me personally this statement does not offer a lot of information In all fairness Terraform documentation offers an example of this service role which we will use in our project data aws iam policy document assume role statement effect Allow principals type Service identifiers batch amazonaws com actions sts AssumeRole resource aws iam role service role name var module name service role assume role policy data aws iam policy document assume role json resource aws iam role policy attachment service role attachment role aws iam role service role name policy arn arn aws iam aws policy service role AWSBatchServiceRole Essentially what we are doing here is creating a role with an IAM policy offered by AWS the name of the policy being AWSBatchServiceRole Moreover we create a trust policy to allow AWS Batch to assume this role Another important thing required by our compute environment is a list of security groups and subnets I tie them together because they are part of the AWS networking infrastructure needed for the project A security group is a stateful firewall while a subnet is part of a virtual private network Networking in AWS is a complex topic and it falls outside the scope of this article Since AWS Batch requires the presence of a minimal networking setup this is what we can use for our purposes resource aws vpc vpc cidr block tags Name var module name vpc resource aws subnet public subnet vpc id aws vpc vpc id cidr block tags Name var module name public subnet resource aws subnet private subnet vpc id aws vpc vpc id cidr block tags Name var module name private subnet resource aws internet gateway igw vpc id aws vpc vpc id tags Name var module name igw resource aws eip eip vpc true resource aws nat gateway nat allocation id aws eip eip id subnet id aws subnet public subnet id tags Name var module name nat depends on aws internet gateway igw resource aws route table public rt vpc id aws vpc vpc id route cidr block gateway id aws internet gateway igw id tags Name var module name public rt resource aws route table private rt vpc id aws vpc vpc id route cidr block gateway id aws nat gateway nat id tags Name var module name private rt resource aws route table association public rt association subnet id aws subnet public subnet id route table id aws route table public rt id resource aws route table association private rt association subnet id aws subnet private subnet id route table id aws route table private rt id Now this may seem like a lot of code What is happening here is that we create an entirely new VPC with subnets a private and a public one We put our cluster behind a NAT to be able to make calls outside to the internet This is required for our batch job to work properly since it has to communicate with the AWS Batch API Last but not least for the security group we can use this resource aws security group sg name var module name sg description Movies batch demo SG vpc id aws vpc vpc id egress from port to port protocol cidr blocks This is probably the simplest security group It allows outbound traffic denying inbound traffic Remember security groups are stateful so this should be perfect for our use case Job QueueNow that we have the compute environment we can create a job queue that will use this environment resource aws batch job queue job queue name var module name job queue state ENABLED priority compute environments aws batch compute environment compute environment arn The definition of a queue is pretty simple it needs a name a state enabled disabled a priority and the compute environment to which it can schedule jobs Next we will need a job Job DefinitionFor a job definition we need a few things to specify Let s see the resource definition first resource aws batch job definition job definition name var module name job definition type container platform capabilities FARGATE container properties jsonencode image data terraform remote state ecr outputs ecr registry url latest environment name TABLE NAME value var table name name BUCKET value var bucket name name FILE PATH value var file path fargatePlatformConfiguration platformVersion LATEST resourceRequirements type VCPU value type MEMORY value executionRoleArn aws iam role ecs task execution role arn jobRoleArn aws iam role job role arn For the platform capabilities we can have the same job being used for FARGATE and EC as well In our case we need only FARGATE For the container properties we need to have a bunch of things in place Probably the most important is the repository URL for the Docker image We will build the Docker image in the next section For the resourceRequirements we configure the CPU and the memory usage These apply to the job itself and they should fit inside the compute environment Moving on we can specify some environment variables for the container We are using these environment variables to be able to pass input to the container We could also override the CMD command part of the Docker container and provide some input values there but we are not doing that in this case Last but not least we see that the job definition requires IAM roles The first one is the execution role which grants to the Amazon ECS container and AWS Fargate agents permission to make AWS API calls according to AWS Batch execution IAM role The second one is the job role which is an IAM role that the container can assume for AWS permissions according to the ContainerProperties docs Is this confusing for anybody else or just for me Probably yes so let s clarify these roles The service role grants permission for the ECS cluster and the ECS Fargate agent to do certain AWS API calls These calls include getting the Docker image from an ECR repository or being able to create CloudWatch log streams resource aws iam role ecs task execution role name var module name ecs task execution role assume role policy data aws iam policy document assume role policy json data aws iam policy document assume role policy statement actions sts AssumeRole principals type Service identifiers ecs tasks amazonaws com resource aws iam role policy attachment ecs task execution role policy role aws iam role ecs task execution role name policy arn arn aws iam aws policy service role AmazonECSTaskExecutionRolePolicy Since AWS already provides a policy for the execution role AmazonECSTaskExecutionRolePolicy we can reuse that The job role will grant permissions to the running container itself In our case if we need to write entries to a DynamoDB table we have to provide write permission to the job to that table Likewise if we read from an S bucket we have to create a policy with S read permission as well resource aws iam role job role name var module name job role assume role policy data aws iam policy document assume role policy json dynamodb table Write Policydata aws iam policy document dynamodb write policy document statement actions dynamodb DeleteItem dynamodb GetItem dynamodb PutItem dynamodb BatchWriteItem dynamodb UpdateItem resources arn aws dynamodb var aws region data aws caller identity current account id table var table name effect Allow resource aws iam policy dynamodb write policy name var module name dynamodb write policy policy data aws iam policy document dynamodb write policy document json resource aws iam role policy attachment dynamodb write policy attachment role aws iam role job role name policy arn aws iam policy dynamodb write policy arn S readonly bucket policydata aws iam policy document s read only policy document statement actions s ListObjectsInBucket resources arn aws s var bucket name effect Allow statement actions s GetObject resources arn aws s var bucket name effect Allow resource aws iam policy s readonly policy name var module name s readonly policy policy data aws iam policy document s read only policy document json resource aws iam role policy attachment s readonly policy attachment role aws iam role job role name policy arn aws iam policy s readonly policy arn Both the service role and job role require a trust policy so they can be assumed by ECS Building the Docker Container for the JobFor our job to be complete we need to build a Docker container with the so called business logic We can store this container in an AWS ECR repository or on DockerHub Usually what I tend to do is to create a separate Terraform project for the ECR The reason for this choice is that the Docker image should exist in ECR at the moment when the deployment of the AWS Batch job happens The code for the ECR is very simple resource aws ecr repository repository name var repo name image tag mutability MUTABLE I also create an output with the Docker repository URL output ecr registry url value aws ecr repository repository repository url This output can be imported inside the other project and can be provided for the job definition data terraform remote state ecr backend s config bucket tf demo states key aws batch demo ecr region var aws region For the source code which will do the ingesting of the movie ratings inside DynamoDB we can use the following Python snippet import csvimport ioimport osfrom zipfile import ZipFileimport botodef download content bucket key print f Downloading data from bucket bucket key s boto resource s response s Object bucket key get print Extracting data zip file ZipFile io BytesIO response Body read r files name zip file read name for name in zip file namelist return files get next iter files keys def write to dynamo csv content table name print Parsing csv data reader csv DictReader io StringIO bytes decode csv content dynamo boto resource dynamodb table dynamo Table table name print f Starting to write data into table table name counter with table batch writer as batch for row in reader counter batch put item Item id row title row title overview row overview release date row release date vote average row vote average vote count row vote count original language row original language popularity row popularity if counter print f Written counter items into table table name print f Finished writing data into table name if name main bucket os environ BUCKET key os environ FILE PATH table name os environ TABLE NAME is env missing False if bucket is None print f Environment variable BUCKET is not set is env missing True if key is None print f Environment variable FILE PATH is not set is env missing True if table name is None print f Environment variable TABLE NAME is not set is env missing True if is env missing print Execution finished with one ore more errors content download content bucket key write to dynamo content table name This code is self explanatory We get an archive with a CSV file from a bucket location we extract that archive and we iterate over the lines while doing batch insert into DynamoDB We can see that certain inputs such as bucket name the archive path and table name are provided as environment variables For the Docker file we can use the following FROM public ecr aws docker library python bullseyeCOPY requirements txt RUN pip install r requirements txtCOPY main py CMD python main py We build the image with the usual Docker build or buildx command docker build platform linux amd t movies loader Note the platform flag is important if we are using a MacBook M since AWS Batch does not support ARM Graviton yet We can push the image to the ECR repository following the push command from the AWS console Triggering a Batch JobThere are several ways to trigger batch jobs since they are available as EventBridge targets For our example we could have a scheduled EventBridge rule which could be invoked periodically To make my life easier and be able to debug my job I opted to create a simple Step Function Step Functions are state machines used for serverless orchestration They are a perfect candidate for managing running jobs offering a way to easily see and monitor the running state of the job and report the finishing status of it We can implement the states of a Step Function using some JSON code resource aws sfn state machine sfn state machine name var module name sfn role arn aws iam role sfn role arn definition lt lt EOF Comment Run AWS Batch job StartAt Submit Batch Job TimeoutSeconds States Submit Batch Job Type Task Resource arn aws states batch submitJob sync Parameters JobName ImportMovies JobQueue aws batch job queue job queue arn JobDefinition aws batch job definition job definition arn End true EOF Like everything in AWS Step Functions require an IAM role as well The IAM role used in our example is similar to what is given in the AWS documentation data aws iam policy document sfn policy statement actions batch SubmitJob batch DescribeJobs batch TerminateJob resources effect Allow statement actions events PutTargets events PutRule events DescribeRule resources arn aws events var aws region data aws caller identity current account id rule StepFunctionsGetEventsForBatchJobsRule effect Allow resource aws iam policy sfn policy name var module name sfn policy policy data aws iam policy document sfn policy json resource aws iam role policy attachment sfn policy attachment role aws iam role sfn role name policy arn aws iam policy sfn policy arn Out Step Function is required to be able to listen to and create CloudWatch Events this is why it is necessary to have the policy for the rule StepFunctionsGetEventsForBatchJobsRule resource see this StackOverflow answer Ultimately we will end up with this simplistic Step Function with only one intermediary state ConclusionsIn this article we ve seen a fairly in depth introduction to the AWS Batch service We also talked about when to use AWS Batch and when to consider other services that might be more adequate for the task at hand We have also built a batch job from the scratch using Terraform Docker and Python In conclusion I think AWS Batch is a powerful service and it gets overshadowed by other offerings targeting more specific tasks While the service itself abstracts away the provisioning of the underlying infrastructure the whole setup process of a batch job can be still challenging and the official documentation in many cases lacks clarity Ultimately if we don t want to get in the weeds we can rely on a Terraform module maintained by the community to spin up a batch job The source code used for this article can also be found on GitHub at this URL ReferencesAWS Batch Documentation Terraform Documentation Compute Environment service roleStateful Firewall AWS Batch Execution IAM Role AWS Batch Container Properties Step Functions |
2023-03-19 11:38:37 |
海外ニュース |
Japan Times latest articles |
Fujii becomes second player in shogi history to hold six major titles |
https://www.japantimes.co.jp/news/2023/03/19/national/fujii-wins-kio-title/
|
Fujii becomes second player in shogi history to hold six major titlesTwenty year old Sota Fujii became only the second player in the history of the shogi board game to hold six major titles after winning the Kio |
2023-03-19 20:57:18 |
ニュース |
BBC News - Home |
Ukraine war: Putin pays visit to occupied Mariupol, state media reports |
https://www.bbc.co.uk/news/world-europe-65004610?at_medium=RSS&at_campaign=KARANGA
|
kremlin |
2023-03-19 11:15:48 |
ニュース |
BBC News - Home |
Harrison: 'I had no option but to take Stephen Bear to court' |
https://www.bbc.co.uk/news/uk-64998904?at_medium=RSS&at_campaign=KARANGA
|
explicit |
2023-03-19 11:32:57 |
ニュース |
BBC News - Home |
Gary Lineker will not present FA Cup coverage after losing his voice |
https://www.bbc.co.uk/news/uk-65005620?at_medium=RSS&at_campaign=KARANGA
|
impartiality |
2023-03-19 11:13:11 |
ニュース |
BBC News - Home |
Boris Johnson: Ex-PM to reveal evidence in his defence over Partygate |
https://www.bbc.co.uk/news/uk-politics-65001385?at_medium=RSS&at_campaign=KARANGA
|
mislead |
2023-03-19 11:02:00 |
ニュース |
BBC News - Home |
Credit Suisse bank: UBS is in talks to take over its troubled rival |
https://www.bbc.co.uk/news/business-65004605?at_medium=RSS&at_campaign=KARANGA
|
credit |
2023-03-19 11:18:10 |
ニュース |
BBC News - Home |
Six Nations 2023: 'Unacceptable to spoil good games with poor red cards' |
https://www.bbc.co.uk/sport/rugby-union/65006193?at_medium=RSS&at_campaign=KARANGA
|
Six Nations x Unacceptable to spoil good games with poor red cards x Matt Dawson says referee Jaco Peyper s decision to send Freddie Steward off in England s defeat by Grand Slam winning Ireland was disgraceful |
2023-03-19 11:11:36 |
コメント
コメントを投稿