AWS |
AWS The Internet of Things Blog |
Creating Object Recognition with Espressif ESP32 |
https://aws.amazon.com/blogs/iot/creating-object-recognition-with-espressif-esp32/
|
Creating Object Recognition with Espressif ESPBy using low cost embedded devices like the Espressif ESP family and the breadth of AWS services you can create an advanced object recognition system ESP microcontroller is a highly integrated solution for Wi Fi and Bluetooth IoT applications with around external components In this example we use AI Thinker ESP CAM variant that comes with an … |
2020-10-02 15:22:53 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
OpenCVでfor文を使用し画像を拡大する。 |
https://teratail.com/questions/295622?rss=all
|
OpenCVでfor文を使用し画像を拡大する。 |
2020-10-03 00:55:17 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
getServerSideProps内でローカルAPIサーバーへfetchするとNotFoundになる |
https://teratail.com/questions/295621?rss=all
|
getServerSideProps内でローカルAPIサーバーへfetchするとNotFoundになるNextjsのgetServerSideProps内で、ポートに建てたGoのバックエンドAPIからデータを取得しようとするとnbspNotFoundになってしまいます。 |
2020-10-03 00:48:59 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
git commit -amのオプションについて |
https://teratail.com/questions/295620?rss=all
|
Detail Nothing |
2020-10-03 00:41:11 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
Visual Studio 2013でCrypto++を使う |
https://teratail.com/questions/295619?rss=all
|
VisualStudioでCryptoを使う名前の後にnbspaposnbspaposnbspを付けることができるのはクラス名または名前空間名だけです私はCでファイルの暗号化復号化の勉強をしています。 |
2020-10-03 00:29:14 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
mplfinance での直線の描画 |
https://teratail.com/questions/295618?rss=all
|
mplfinanceでの直線の描画mplfinanceがアップデートされ、以下のDataframe形式であれば簡単にローソク足チャートを描画できるようになりました。 |
2020-10-03 00:24:52 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
ReactNativeを用いたアプリ作成 |
https://teratail.com/questions/295617?rss=all
|
reactnative |
2020-10-03 00:21:22 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
大富豪においてビット演算を用いた階段計算がわからない |
https://teratail.com/questions/295616?rss=all
|
大富豪においてビット演算を用いた階段計算がわからない環境UnityC以下サイト内のビット演算を用いて、大富豪における階段系演算を把握したいと思っております。 |
2020-10-03 00:06:23 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
Flutterで取得したいデータが、要望より少なかった時の挙動 |
https://teratail.com/questions/295615?rss=all
|
Flutterで取得したいデータが、要望より少なかった時の挙動Flutterで、つの画像データを取得したいとき、フロント画面ではつの箱を灰色グレーアウトで用意しておいて、取得後、箱に画像をレンダリングすると言った動きを作りたいです。 |
2020-10-03 00:00:39 |
Program |
[全てのタグ]の新着質問一覧|teratail(テラテイル) |
Photoshopのレイヤーマスクへの二重効果について |
https://teratail.com/questions/295614?rss=all
|
Photoshopのレイヤーマスクへの二重効果についてphotoshopについて、わからないところがあります。 |
2020-10-03 00:00:38 |
AWS |
AWSタグが付けられた新着投稿 - Qiita |
【SOA対策】CloudWatch |
https://qiita.com/Kouichi_Itagaki/items/9043e65205552d8200c3
|
様々なAWSリソースのログをCloudWatchLogsダッシュボードに集約出来、ログの分析、メトリクスフィルターを使用して状態の監視を行い、特定の状態でAlarmを発砲してエラーを検知出来るサービス。 |
2020-10-03 00:09:53 |
Docker |
dockerタグが付けられた新着投稿 - Qiita |
Tensorflow2.3のkerasで再現性を確保する |
https://qiita.com/temple1026/items/05546696f5dc9828e270
|
Tensorflowのkerasで再現性を確保するはじめにTensorFlowでモデルの学習をするとき以下の記事のようにSeedを固定することで再現性を保つ方法があるようなのですがTensorFlowの環境ではseedの固定だけだと毎回同じ結果になりませんでしたTensorFlowxtfkeras乱数シードを固定して再現性を向上kerasで再現性の担保現在tensorflowtfkerasでGPU計算の再現性を確保する方法のメモtensorflowで同じコードなのに結果が異なる。 |
2020-10-03 00:53:27 |
技術ブログ |
Developers.IO |
【レポート】CUS-86:商取引の活性化は与信システムの革命が促進する。ネットプロテクションズが挑む Amazon ECS と Amazon SageMaker を用いた決済・金融のマイクロサービス化戦略について #AWSSummit |
https://dev.classmethod.jp/articles/aws-summit-online-2020-session-report-cus-86/
|
【レポート】CUS商取引の活性化は与信システムの革命が促進する。 |
2020-10-02 15:32:00 |
海外TECH |
Ars Technica |
Nearly 20,000 workers have had COVID-19, Amazon admits |
https://arstechnica.com/?p=1711245
|
amazon |
2020-10-02 15:53:38 |
海外TECH |
Ars Technica |
SpaceX, Northrop seek to break launch gremlin curse with Friday night attempts |
https://arstechnica.com/?p=1711240
|
attemptsspacex |
2020-10-02 15:35:30 |
Apple |
AppleInsider - Frontpage News |
How to use Picture in Picture in tvOS 14 |
https://appleinsider.com/articles/20/10/02/how-to-use-picture-in-picture-in-tvos-14
|
How to use Picture in Picture in tvOS Apple TV s updated Picture in Picture feature is useful but it doesn t work with everything you want to watch ーand it s a little awkward to use With the updated Picture in Picture feature you can stream security cameras or watch two videos at onceOnce your Apple TV has the new tvOS or later you can send what you re watching into a corner of the screen It carries on playing as you start an Apple Arcade game as you search the App Store for something or when you are just looking for anything better to watch Read more |
2020-10-02 15:35:38 |
Apple |
AppleInsider - Frontpage News |
Today only: 8-core 15-inch MacBook Pro drops to $1,849 ($950 off) |
https://appleinsider.com/articles/20/10/02/today-only-8-core-15-inch-macbook-pro-drops-to-1849-950-off
|
Today only core inch MacBook Pro drops to off Amazon owned Woot s latest flash deal offers substantial savings on Apple s inch MacBook Pro that s equipped with a Core i processor GB SSD and upgraded graphics Flash MacBook Pro dealThe daily deal offers shoppers off Apple s Mid inch MacBook Pro bringing the price down to These units are refurbished by Apple but come with a year Woot warranty in lieu of an Apple warranty and are packaged in a generic white box Read more |
2020-10-02 15:25:21 |
Apple |
AppleInsider - Frontpage News |
Apple TV+ review: 'Tiny World' gets back to nature, with Paul Rudd narrating |
https://appleinsider.com/articles/20/10/02/apple-tv-review-tiny-world-gets-back-to-nature-with-paul-rudd-narrating
|
Apple TV review x Tiny World x gets back to nature with Paul Rudd narrating Ant Man actor Paul Rudd hosts a beautifully rendered take on the smaller side of the animal kingdom in Tiny World Paul Rudd narrates Tiny World premiering Friday October exclusively on Apple TV Tiny World somewhat improbably is not the first Apple TV original project to feature an extreme closeup of a small dung beetle diving at a fresh piece of elephant dung This also happened in The Elephant Queen the documentary that debuted on the service around the time of its launch late last year Read more |
2020-10-02 15:55:35 |
Apple |
AppleInsider - Frontpage News |
New 5G iPhone SE with dual-lens camera in 2022, ProMotion in 'iPhone 13' display analyst says |
https://appleinsider.com/articles/20/10/02/new-5g-iphone-se-with-dual-lens-camera-in-2022-promotion-in-iphone-13-display-analyst-says
|
New G iPhone SE with dual lens camera in ProMotion in x iPhone x display analyst saysA new rumor chimes in on ProMotion arriving with the iPhone and Apple may not release new iPhone SE until but it will arrive with G support a dual camera setup and a larger display when it does Credit Andrew O Hara AppleInsiderAccording to display expert Ross Young Apple won t release a new iPhone SE model in the spring of Instead a successor to the low cost iPhone arrives in the spring of Read more |
2020-10-02 15:54:47 |
海外TECH |
Engadget |
Amazon Music HD is adding thousands more Ultra HD songs and albums |
https://www.engadget.com/amazon-universal-warner-music-group-remaster-hd-songs-155318511.html
|
Amazon Music HD is adding thousands more Ultra HD songs and albumsAmazon introduced its high res music streaming tier Amazon Music HD last fall Now it says the service is about to get a whole lot better Amazon Music is teaming up with Universal Music Group and Warner Music Group to remaster thousands of songs |
2020-10-02 15:53:18 |
海外TECH |
Engadget |
Apple Watch Series 6 review: The best new features are the boring ones |
https://www.engadget.com/apple-watch-series-6-review-153047133.html
|
Apple Watch Series review The best new features are the boring onesThis fall marks the fifth anniversary of the original Apple Watch Other than the basic design itself a square display with a digital crown and mostly familiar lineup of wrist straps a lot has changed Gone is the solid gold edition tha |
2020-10-02 15:30:56 |
海外TECH |
Engadget |
'Fall Guys' season 2 begins October 8th |
https://www.engadget.com/fall-guys-season-2-launch-date-october-8-150844239.html
|
x Fall Guys x season begins October thGood news for fans of the wildly popular game Fall Guys The game s second season will be launching on October th according to its Twitter account That means those who have been waiting for more rounds have less than a week left to wait The game |
2020-10-02 15:08:44 |
Cisco |
Cisco Blog |
Cisco Named a Leader in Aragon Research Globe for Team Collaboration 2020 |
https://blogs.cisco.com/collaboration/cisco-named-a-leader-in-aragon-research-globe-for-team-collaboration-2020
|
Cisco Named a Leader in Aragon Research Globe for Team Collaboration This week industry analyst firm Aragon Research published their annual Aragon Research Globe for Team Collaboration and I am thrilled to announce that Cisco has again been identified as a Leader The post Cisco Named a Leader in Aragon Research Globe for Team Collaboration appeared first on Cisco Blogs |
2020-10-02 15:51:40 |
Cisco |
Cisco Blog |
Disruption Leads to Innovation: Cisco at NVIDIA GTC |
https://blogs.cisco.com/partner/disruption-leads-to-innovation-cisco-at-nvidia-gtc
|
Disruption Leads to Innovation Cisco at NVIDIA GTCInnovation has emerged front and center as the means to cope with epic change One example is the Virtual Workstation solution with Cisco UCS and NVIDIA GPUs These Virtual Workstations are giving employees the ability to work on highly complex and graphics intensive applications remotely Where do you learn the most up to date information on innovation The answer is NVIDIA s GPU Technology Conference The post Disruption Leads to Innovation Cisco at NVIDIA GTC appeared first on Cisco Blogs |
2020-10-02 15:00:48 |
海外TECH |
CodeProject Latest Articles |
Building a Database Application in Blazor - Part 2 - Services - Building the CRUD Data Layers |
https://www.codeproject.com/Articles/5279596/Building-a-Database-Application-in-Blazor-Part-2-S
|
application |
2020-10-02 15:55:00 |
海外TECH |
CodeProject Latest Articles |
A Beginner's Tutorial for Understanding and Implementing a CRUD APP using Elasticsearch and C# - Part 2 |
https://www.codeproject.com/Articles/1033116/A-Beginners-Tutorial-for-Understanding-and-Imple-3
|
integration |
2020-10-02 15:55:00 |
海外TECH |
WIRED |
I'm Done Being Mistaken for Jeff Bezos and MacKenzie Scott |
https://www.wired.com/story/done-being-mistaken-jeff-bezos-mackenzie-scott
|
address |
2020-10-02 15:56:30 |
海外ニュース |
Japan Times latest articles |
Brazil’s Amazon sees nearly two-thirds more fires than last September |
https://www.japantimes.co.jp/news/2020/10/02/world/brazil-amazon-two-thirds-more-fires/
|
Brazil s Amazon sees nearly two thirds more fires than last SeptemberSatellites used by the National Institute of Space Research detected outbreaks last month in the Amazon compared to in the same month in |
2020-10-03 00:37:57 |
ニュース |
BBC News - Home |
Trump Covid: US president has mild symptoms - White House |
https://www.bbc.co.uk/news/world-us-canada-54391986
|
president |
2020-10-02 15:24:54 |
ニュース |
BBC News - Home |
Covid: Growth in Covid cases 'may be levelling off' |
https://www.bbc.co.uk/news/health-54387057
|
previous |
2020-10-02 15:38:16 |
ニュース |
BBC News - Home |
Brexit: EU calls for trade talks to 'intensify' ahead of call with UK |
https://www.bbc.co.uk/news/uk-54384437
|
boris |
2020-10-02 15:16:45 |
ニュース |
BBC News - Home |
London Marathon: Kenenisa Bekele to miss race because of injury |
https://www.bbc.co.uk/sport/athletics/54386018
|
London Marathon Kenenisa Bekele to miss race because of injuryKenenisa Bekele s much anticipated London Marathon duel with Eliud Kipchoge is off as the Ethiopian pulls out of Sunday s race with a calf injury |
2020-10-02 15:16:39 |
北海道 |
北海道新聞 |
桐生が10秒27で2度目の優勝 陸上、日本選手権第2日 |
https://www.hokkaido-np.co.jp/article/466702/
|
日本選手権 |
2020-10-03 00:13:24 |
北海道 |
北海道新聞 |
レバンガ 3日名古屋Dと開幕戦 |
https://www.hokkaido-np.co.jp/article/466711/
|
開幕戦 |
2020-10-03 00:03:04 |
GCP |
Cloud Blog |
Gauge the effectiveness of your DevOps organization running in Google Cloud |
https://cloud.google.com/blog/products/devops-sre/another-way-to-gauge-your-devops-performance-according-to-dora/
|
Gauge the effectiveness of your DevOps organization running in Google CloudEditor s note There are many ways to skin the DevOps cat Google Cloud Developer Programs Engineer Dina Graves Portman recently wrote about how to evaluate your DevOps effectiveness using the open source Four Keys project Here Google Customer Engineer Brian Kaufman shows you how to do the same thing but for an application that runs entirely on Google Cloud Many organizations aspire to become true high functioning DevOps shops but it can be hard to know where you stand According to DevOps Research and Assessment or DORA you can prioritize just four metrics to measure the effectiveness of your DevOps organizationーtwo to measure speed and two to measure stability Speed Lead Time for Changes Code commit to code in production Deployment Frequency How often you push codeStability Change Failure Rate Rate of deployment failures in production that require immediate remedy Rollback or manual change Time to Restore Service MTTR Mean time to recovery In this post we present a methodology to collect these four metrics from software delivery pipelines and applications deployed in Google Cloud You can then use those metrics to rate your overall practice effectiveness and baseline your organization s performance against DORA industry benchmarks and determine whether you re an Elite High Medium or Low performer Click to enlargeLet s take a look at how to do this in practice with a sample architecture running on Google Cloud Services and reference architectureTo get started we create a CI CD pipeline with the following cloud services Github Code RepoCloud Build a container based CI CD Tool Container Registry Google Kubernetes Engine GKE Cloud Load Balancing used as an Ingress Controller for GKE Cloud Uptime Checks for synthetic application monitoringCloud MonitoringCloud FunctionsPub Sub used as a message bus to connect Alerts to Cloud Functions These are combined into the reference architecture below Note that all of these Google Cloud services are integrated with Cloud Monitoring As such there s nothing in particular that you need to set up to receive service logs and many of these services have built in metrics that we ll use in this post Google Cloud Platform CI CD pipeline and application topologyMeasuring SpeedTo measure our two speed metricsーdeployment frequency and lead time to commitーwe instrument Cloud Build which is a continuous integration and continuous delivery tool As a container based CI CD tool Cloud Build lets you load a series of Google managed orcommunity managed Cloud Buildersto manipulate your code or interact with internal and external services during the build deployment process Upon firing a build trigger Cloud Build reaches into our Git Repository for our source code creates a container image artifact that it pushes to the container registry and then deploys the container image to a GKE cluster You can also import your own cloud builder container in the process and insert it as the final build step to determine the time from commit to deployment as well as whether this is a rollback deployment For this example we ve created a custom container to be used as the last build step that Retrieves the payload binding for the commit timestamp accessed by the variable push repository pushed at and compares it against the current timestamp to calculate lead time The payload binding variable is used when we create the trigger and is referenced by a custom variable MERGE TIME in cloudbould yaml Reaches into the source repo to get the commit ID of the latest commit on the master branch and compares it to the current commit ID of the build to determine if it is a rollback or a match You can find a reference Cloud Build config yaml here that shows each build step described above If you re using a non built in variable like MERGE TIME payload binding in your config file you need to specify the variable map when you setup the cloud build trigger to the push repository pushed at value You can find the custom cloud builder container used here After the build step for this container runs the following is outputted to the Cloud Build logs which are fed automatically into Cloud Monitoring Notice the commit ID Rollback value and LeadTime values which are written to the logs from our custom cloud builder Next we can create a log based metric in Cloud Logging to absorb these custom values Log based metrics can be based on filters for specific log entries Once we have our specific log entries filter we can use regular expressions assigned to a particular piece of the output logs to capture specific sections of the log entry into metrics In the screenshots below we created labels for the commit name and rollback value that will attach to the LeadTime value that shows up in the textPayload field of our log We use the following regular expressions Metric Value Create log based metric and labelsLead Time for ChangesOnce we have the above metric and labels created from our Cloud Build log we can access it in Cloud Operations Metrics explorer via the metric label logging user dorametics DoraMetrics was the name we gave our log based metric The value of the metric will be the LeadTime as extracted from the regular expression above with Rollbacks filtered out We use the median or th percentile Deployment FrequencyNow that we have the lead time for each commit we can determine the frequency of deployments by just counting the number of lead times we recorded in a window Measuring stability Change Failure CountTo determine the number of software rollbacks that were performed we can look at our Deployment Frequency and filter for Rollback True metrics This gives us a count of the total rollbacks performed If we wanted to determine the Change Failure Rate we would use data collected in this chart and divide it by the Deployment Frequency metric collected above for the same window Mean Time To Resolution MTTR In typical enterprise environments there are incident response systems that allow you to determine when an issue was reported and when it is ultimately resolved Assuming these times could be queried MTTR could be determined by the average time between the reported and resolved timestamps of the issues In this blog we use automation to alert and graph issues which allows us to gather more accurate service disruption metrics Our strategy involves the use of Service Level Objectives SLO which represents Service Level Indicators SLI that we ve determined represent our customers happiness with our application and an objective When we violate an SLO we consider our mean time to restore service is the total time it takes to detect mitigate and resolve a problem until we are back in compliance with the SLO MTTR and customer satisfactionFor the purposes of simplicity we ve highlighted one metric we feel represents our customer satisfaction overall HTTP response code errors from our website The ratio of this metric against the total response codes sent over a given time window constitutes our Service Level Indicator SLI For total errors we monitor response codes returned from our front end load balancer which is set up as an ingress controller in our GKE cluster Metric Used loadbalancing googleapis com https request count Group by response codeUsing this metric above we can build our SLI and wrap it into an SLO that represents the customer satisfaction observed over a longer time window Using the SLO API we create custom SLOs that represent the level of customer satisfaction we want to monitor where being in violation of that SLO indicates an issue There s a great tutorial on how to create custom SLOs and services here In this example we ve created a custom service to represent our application and an SLO for HTTP LB response codes code It assumes a quality of service level in which of responses from the load balancer should not be errors in a given day Doing this automatically creates an error budget of over hours Now when it comes to monitoring for MTTR we have a metric SLI that s attached to a service level SLO that represents quality of service over a given window of time The failure of the SLO is simulated in the screenshot below Next we set up an alert policy that fires when we are in danger of violating this SLO This also starts a timer to calculate the time to resolution What we re measuring here is referred to as burn rate ーhow much of our error budget of errors over hours we are eating up with the current SLI metic The window we measure for our alert is much smaller than our entire SLO so when the SLI has moved back within compliance of a threshold another alert fires indicating the incident has cleared For more information on setting up alerting policies please visit this page You can also send out alerts through a variety of channels allowing you to integrate into existing ticketing or messaging systems to record the MTTR in a way that makes sense for your organization For our purposes we integrate with the Pub Sub message bus channel sending the alerts to a cloud function that performs the necessary charting calculators In the message from the clearing alert we see the JSON payload has the started at and ended at timestamps We use these timestamps in our cloud function to calculate the time to resolve the issue and then output it to the logs Here is the entire Pub Sub message sent to Cloud Functions Here is the cloud function connected to the same Pub Sub topic as the Alert The results in the following messages sent to Cloud Functions logs The final step is to create another log based metric to pick up the Time to Resolve value that we print to our cloud functions log We do so with this regex expression Resolve s Now the metric is available in Cloud Operations ConclusionWe ve shown above how you can create custom cloud builders in Cloud Build to generate metrics relating to deployment frequency mean time to deployment and rollback that will appear in Cloud Operations logs We ve also shown you how to use SLOs and SLIs to generate and push alerts to your Cloud Functions logs We ve used log based metrics to pull our metrics out of the logs and chart them These metrics can be used to evaluate the effectiveness of your organization s software development and delivery pipelines over time as well as help you evaluate your performance amongst the greater DevOps community Where does your organization land For more inspiration here is some further reference material to help you measure the effectiveness of your own DevOps organization Google Cloud Application Modernization Program blog Setting SLOs a step by step guide blog Setting SLOs observability using custom metrics blog Concepts in Service Monitoring documentation Working with the SLO API documentation How to create SLOs in the GCP Console video How to create SLOs at scale with the SLO API video How to create SLOs using custom metrics video GitHub SLO API Code used for BlogDORA Quick CheckThe Keys Project for DORA Metric Ingression into BigQuery new ways we re improving observability with Cloud Ops blog Related ArticleAre you an Elite DevOps performer Find out with the Four Keys ProjectLearn how the Four Keys open source project lets you gauge your DevOps performance according to DORA metrics Read Article |
2020-10-02 16:00:00 |
GCP |
Cloud Blog |
Toward automated tagging: bringing bulk metadata into Data Catalog |
https://cloud.google.com/blog/products/data-analytics/best-practices-for-bulk-ingestion-of-metadata-to-cloud-data-catalog/
|
Toward automated tagging bringing bulk metadata into Data CatalogData Catalog lets you ingest and edit business metadata through an interactive interface It includes programmatic interfaces that can be used to automate your common tasks Many enterprises have to define and collect a set of metadata using Data Catalog so we ll offer some best practices here on how to declare create and maintain this metadata in the long run In our previous post we looked at how tag templates can facilitate data discovery governance and quality control by describing a vocabulary for categorizing data assets In this post we ll explore how to tag data using tag templates Tagging refers to creating an instance of a tag template and assigning values to the fields of the template in order to classify a specific data asset As of this writing Data Catalog supports three storage back ends BigQuery Cloud Storage and Pub Sub We ll focus here on tagging assets that are stored on those back ends such as tables columns files and message topics We ll describe three usage models that are suitable for tagging data within a data lake and data warehouse environment provisioning of a new data source processing derived data and updating tags and templates For each scenario you ll see our suggested approach for tagging data at scale Provisioning data sourcesProvisioning a data source typically entails several activities creating tables or files depending on the storage back end populating them with some initial data and setting access permissions on those resources We add one more activity to this list tagging the newly created resources in Data Catalog Here s what that step entails Tagging a data source requires a domain expert who understands both the meaning of the tag templates to be used and the semantics of the data in the data source Based on their knowledge the domain expert chooses which templates to attach as well as what type of tag to create from those templates It is important for a human to be in the loop given that many decisions rely on the accuracy of the tags We ve observed two types of tags based on our work with clients One type is referred to as static because the field values are known ahead of time and are expected to change only infrequently The other type is referred to as dynamic because the field values change on a regular basis based on the contents of the underlying data An example of a static tag is the collection of data governance fields that include data domain data confidentiality and data retention The value of those fields are determined by an organization s data usage policies They are typically known by the time the data source is created and they do not change frequently An example of a dynamic tag is the collection of data quality fields such as number values unique values min value and max value Those field values are expected to change frequently whenever a new load runs or modifications are made to the data source In addition to these differences static tags also have a cascade property that indicates how their fields should be propagated from source to derivative data We ll expand on this concept in a later section By contrast dynamic tags have a query expression and a refresh property to indicate the query that should be used to calculate the field values and the frequency by which they should be recalculated An example of a config for a static tag is shown in the first code snippet and one for a dynamic tag is shown in the second YAML based static tag configYAML based dynamic tag configAs mentioned earlier a domain expert provides the inputs to those configs when they are setting up the tagging for the data source More specifically they first select the templates to attach to the data source Secondly they choose the tag type to use namely static or dynamic Thirdly they input the values of each field and their cascade setting if the type is static or the query expression and refresh setting if the type is dynamic These inputs are provided through a UI so that the domain expert doesn t need to write raw YAML files Once the YAML files are generated a tool parses the configs and creates the actual tags in Data Catalog based on the specifications The tool also schedules the recalculation of dynamic tags according to the refresh settings While a domain expert is needed for the initial inputs the actual tagging tasks can be completely automated We recommend following this approach so that newly created data sources are not only tagged upon launch but tags are maintained over time without the need for manual labor Processing derivative dataIn addition to tagging data sources it s important to be able to tag derivative data at scale We define derivative data in broad terms as any piece of data that is created from a transformation of one or more data sources This type of data is particularly prevalent in data lake and warehousing scenarios where data products are routinely derived from various data sources The tags for derivative data should consist of the origin data sources and the transformation types applied to the data The origin data sources URIs are stored in the tag and one or more transformation types are stored in the tagーnamely aggregation anonymization normalization etc We recommend baking the tag creation logic into the pipeline that generates the derived data This is doable with Airflow DAGs and Beam pipelines For example if a data pipeline is joining two data sources aggregating the results and storing them into a table you can create a tag on the result table with references to the two origin data sources and aggregation true You can see this code snippet of a Beam pipeline that creates such a tag Beam pipeline with tagging logicOnce you ve tagged derivative data with its origin data sources you can use this information to propagate the static tags that are attached to those origin data sources This is where the cascade property comes into play which indicates which fields should be propagated to their derivative data An example of the cascade property is shown in the first code snippet above where the data domain and data confidentiality fields are both to be propagated whereas the data retention field is not This means that any derived tables in BigQuery will be tagged with data domain HR and data confidentiality CONFIDENTIAL using the dg template Handling updatesThere are several scenarios that require update capabilities for both tags and templates For example if a business analyst discovers an error in a tag one or more values need to be corrected If a new data usage policy gets adopted new fields may need to be added to a template and existing fields renamed or removed We provide configs for tag and template updates as shown in the figures below The tag update config specifies the current and new values for each field that is changing The tool processes the config and updates the values of the fields in the tag based on the specification If the updated tag is static the tool also propagates the changes to the same tags on derivative data The template update config specifies the field name field type and any enum value changes The tool processes the update by first determining the nature of the changes As of this writing Data Catalog supports field additions and deletions to templates as well as enum value additions but field renamings or type changes are not yet supported As a result the tool modifies the existing template if a simple addition or deletion is requested Otherwise it has to recreate the entire template and all of its dependent tags YAML based tag update configYAML based template update config We ve started prototyping these approaches to release an open source tool that automates many tasks involved in creating and maintaining tags in Data Catalog in accordance with our proposed usage model Keep an eye out for that In the meantime learn more about Data Catalog tagging |
2020-10-02 16:00:00 |
コメント
コメントを投稿