投稿時間:2022-05-01 02:11:09 RSSフィード2022-05-01 02:00 分まとめ(12件)

カテゴリー等 サイト名等 記事タイトル・トレンドワード等 リンクURL 頻出ワード・要約等/検索ボリューム 登録日
Docker dockerタグが付けられた新着投稿 - Qiita au光で大容量のDockerイメージがpushできない https://qiita.com/HarutakaMatusmoto/items/6454f7b1ce05a4584ae5 gdockerfor 2022-05-01 01:25:55
海外TECH MakeUseOf What Does FPS Mean in Gaming? https://www.makeuseof.com/video-games-fps-meaning/ explore 2022-04-30 16:45:14
海外TECH MakeUseOf Get Your Heart Pumping With These 8 Aerobics Workout Apps https://www.makeuseof.com/get-heart-pumping-aerobics-workout-apps/ workouts 2022-04-30 16:30:13
海外TECH MakeUseOf Why You Should Blur Your House on Google Street View (and How) https://www.makeuseof.com/blur-your-house-google/ Why You Should Blur Your House on Google Street View and How Shocked to learn that your house is on Google Maps Street View feature Here s how to blur out your house on Google Maps and stay private online 2022-04-30 16:30:14
海外TECH MakeUseOf How to Back Up and Restore Windows 10 Device Drivers https://www.makeuseof.com/windows-10-back-up-restore-drivers/ windows 2022-04-30 16:15:14
海外TECH MakeUseOf How to Use Snapchat's Image Editing Tools https://www.makeuseof.com/how-to-use-snapchat-image-editing/ tools 2022-04-30 16:15:13
海外TECH MakeUseOf SwitchBot Curtain Review: Make Curtains Great Again! https://www.makeuseof.com/switchbot-curtain-review/ devices 2022-04-30 16:05:13
海外TECH DEV Community The design concept of an almighty Opensource project about machine learning platform https://dev.to/qazmkop/the-design-concept-of-an-almighty-opensource-project-about-machine-learning-platform-46p The design concept of an almighty Opensource project about machine learning platformIn my previous article Almighty Opensource project about machine learning you should try out I introduced MetaSpore In this article I will introduce MetaSpore s design philosophy in detail Next week I will update the detailed steps of using MetaSpore to build industrial recommendation systems rapidly How did MetaSpore move its complex recommendation algorithms beyond the Internet giants to reach the vast majority of SMEs and developers This article unveils MetaSpore s core design philosophy What is a one stop machine learning platformWhen it comes to machine learning people tend to think of various machine learning frameworks such as TensorFlow and PyTorch etc Many models implement Python code on GitHub However there are still a lot of difficulties in implementing the algorithm model in specific business scenarios Taking the recommendation system as an example the following problems will be encountered in the implementation of the algorithm model in specific business scenarios How to generate training data samples In recommendation scenarios it is often necessary to splice user feedback signals exposure click etc with various feature sources perform necessary feature cleaning and extraction and divide data into verification sets negative sampling and other complex data processing These processes are usually impossible in machine learning frameworks so a big data system is needed to process bulk or streaming data How to transfer training data generated by big data platforms to a deep learning framework Frameworks such as TensorFlow and PyTorch have their data input formats and corresponding DataLoader interfaces requiring developers to parse the data So how to deal with data fragmentation variable length characteristics and other problems often puzzle algorithm engineers How to run a distributed training that can freely schedule cluster resources including GPUs This may require a dedicated operations team to manage machine learning related hardware scheduling How to train the sparse feature model and use the NLP pre training model In the recommendation scenario we need to be able to handle sparse features on a large scale to model the interesting relationship between users and goods At the same time multimodal model fusion has gradually become the frontier direction How to make online predictions efficiently after model training In addition to cluster resources elastic scheduling and load balancing are also involved These problems require dynamic resource allocation in a heterogeneous environment with CPUs GPUs and NPUS For Complex models distillation quantification and other means are necessary to ensure their predictive performance After the model goes online how does the online system extract splicing features ensure the consistency of offline features and evaluate the algorithm effect quantitatively An online algorithm application framework that can integrate with an online system read all kinds of online data sources and provide a multi layer ABTest traffic experiment function is necessary Finally how to conduct efficient iteration in an algorithm experiment Programmers want to run multiple parallel experiments quickly to improve business performance rather than get bogged down in complex system environment configurations Multiple teams and different systems often solve these problems in large Internet factories For example the following image is from the overall architecture of the recommendation system shared by Netflix s algorithm engineering team As can be seen from the above figure that building a complete industrial grade recommendation system is quite complex and tedious requiring considerable knowledge in different fields and investment in engineering development SMEs lack the staffing and a one stop platform to solve these problems standardized MetaSpore s one stop machine learning platform The original intention of DMetaSoul to develop and open source MetaSpore is to help enterprises and developers solve all kinds of problems encountered in the process of algorithm business development based on MetaSpore features and provide a one stop development experience by using standardized components and development interfaces Meet the needs of enterprises and developers to obtain algorithmic business development best practices Specifically MetaSpore has the following core functional design concepts Model training is seamlessly integrated with the big data system which can directly read the structured and unstructured data of all kinds of data lakes and warehouses for training Data feature preprocessing and model training are seamlessly linked together saving the tedious data import and export and format conversion Support for sparse features Large scale sparse Embedding layer training is necessary for search and generalization scenarios Some processing of sparse features is involved such as cross combination variable length feature pooling etc which requires special support from the training framework Provide high performance online forecasting services Online prediction services support neural networks including sparse Embedding decision trees and a variety of traditional machine learning models Supports heterogeneous hardware computing acceleration reducing the engineering threshold for online deployment Unified offline feature calculation Through the unified feature format and calculation logic the unified offline feature calculation saves the repeated development of multiple systems and ensures the consistency of offline features Online algorithm application framework The online algorithm application framework covers the common function points of online systems such as automatic feature extraction from multi data sources feature calculation predictive service interface experimental dynamic configuration ABTest dynamic cutting flow etc Embrace open source The MetaSpore platform provides several homegrown components to implement these core function points At the same time MetaSpore s development philosophy is to embrace the mature open source ecosystem as much as possible without fragmentation lowering the barriers to learning and enabling developers to develop based on their existing experience quickly MetaSpore integrates data algorithms and online systems to provide a one stop full process development experience for algorithms through these core functional points and design concepts MetaSpore s integration with big data ecology A large class of machine learning algorithms deals with structured tabular data and conduct prediction classification regression and modeling For example CTR estimation of recommended advertising and financial risk control are common In the practice of this kind of algorithm it is very important to preprocess data The quality of the data is directly related to the final model effect Common data processing includes feature splicing null value filling discrete bucket division outlier filtering feature engineering and sample generation such as verification set division and random shuffle These steps often rely on big data systems for processing in engineering practice while deep learning frameworks lack comprehensive big data processing functions There will be many problems with data and training connection in this case Big data systems have multiple storage formats and systems making it difficult for deep learning frameworks to adopt one by one Traditionally algorithmic engineers are required to process data format conversion by themselves converting the format of big data systems into formats that deep learning frameworks can recognize Not only is this cumbersome it creates data redundancy problems but it is also challenging to implement incremental streaming learning MetaSpore provides an offline training framework using the Parameter Server architecture for such scenarios The MetaSpore offline training framework integrates seamlessly with PySpark allowing you to input the Spark DataFrame directly into model training with no format conversion required Spark supports multiple data sources including CSV Parquet and ORC It also supports various data stores data lakes such as Hive and streaming data such as Kafka By docking with PySpark MetaSpore can directly train all the data sources Spark can support greatly simplifying the Pipeline for data processing In terms of implementation MetaSpore takes advantage of the PandasUDF functionality provided by PySpark PandasUDF is a Python vectorization UDF interface provided by PySpark that uses Arrow as an efficient data transfer protocol from JVM to Python The Worker end of the MetaSpore training framework is a PandasUDF which receives a batch of data and sends it to the model training module The model training results such as predicted values are also sent back to Spark in Arrow format to form a new DataFrame Take a look at a code example Defining Neural Networksmodule metaspore algos MLP Create a PyTorch Distributed Training Estimatorestimator metaspore PyTorchEstimator module Read training and validation sets via Sparktrain df spark session read csv hdfs movielens train test df spark session read csv hdfs movielens test Model training on the training setmodel estimator fit train df Use the trained model to make predictions on the test settest prediction result model transform test df The predicted result is Spark DataFrame Operations that can be invoked on any DataFrameevaluator pyspark ml evaluation BinaryClassificationEvaluator auc evaluator evaluate test prediction result print auc You can see that MetaSpore integrates seamlessly with Spark The read CSV in the middle can be replaced by any data source supported by Spark and the model is trained after complex data processing in Spark MetaSpore s support for sparse discrete feature training A large class of features is sparse and discrete in search advertising recommendation and other personalized recommendation scenarios This feature is usually converted to a fixed ID using one hot Encoding and mapped to a Vector as a learning parameter In this scenario there will be two kinds of problems One is that the space of feature ID is very large usually over million level the other is that the ID space is not fixed and the value types of some discrete features will change constantly MetaSpore provides a Parameter Server architecture to split the Embedding Table for the first problem To solve the second problem MetaSpore provides a complete set of dynamic sparse feature learning schemes storing Embedding Table through hash Table and realizing the addition and removal of Embedding Vector The original discrete eigenvalue hash multi value combination and multi value pooling are integrated and defined as a PyTorch Module for the recommendation scenario Define a sparse MLP model using MetaSpore as follows class MLPLayer torch nn Module def init self super init self sparse metaspore EmbeddingSumConcat deep embedding dim deep column name path deep combine schema path dense layers dnn linear num len hidden lay unit dense layers append metaspore nn Normalization feature dim embedding size dense layers append torch nn Linear feature dim embedding size hidden lay unit dense layers append torch nn ReLU for i in range dnn linear num dense layers append torch nn Linear hidden lay unit i hidden lay unit i dense layers append torch nn ReLU dense layers append torch nn Linear hidden lay unit hidden lay unit self dnn torch nn Sequential dense layers def forward self input raw strings sparse self sparse input raw strings return self dnn sparse In the above code by creating an EmbeddingSumConcat layer you can accept sample input of raw string type automatically hash combine pooling default Sum and input raw strings for the forward method It is a batch of table data read by PySpark a Pandas DataFrame object Compared with TorchRec and other recommendation system deep learning frameworks the API level is simplified and the call logic is more direct and easy to understand MetaSpore model Serving service The MetaSpore platform integrates online model Serving prediction services Unlike TFServing MetaSpore Serving has the following characteristics Support a wider range of models In addition to sparse NN models generated by the offline training framework MetaSpore Serving also supports NLP CV and other NN models XGBoost LightGBM and other decision tree models And Spark ML SKLearn and other machine learning library models Integrate feature processing logic Feature calculation is an integral part of model Serving such as sparse discrete feature processing logic Tokenizer in NLP etc Traditionally these logics are implemented in two offline sets which are inefficient and prone to logical inconsistencies and other errors MetaSpore Serving integrates the computational logic for sparse features and shares a code offline Supports heterogeneous hardware acceleration such as CPU GPU and NPU Model prediction usually requires selecting different hardware to perform calculations in different scenarios MetaSpore Serving is capable of supporting common hardware for heterogeneous computing In MetaSpore Serving OnnxRuntime is used as the computational library And feature processing is used as prediction processing logic The prediction calculation of each model including several feature table calculations and the final Onnx model calculation forms a DAG calculation graph For example in a typical Wide amp Deep model the computational logic at Serving can be roughly expressed as follows In this way MetaSpore Serving can easily integrate various feature extraction and calculation modules maximizing alignment with offline logic and simplifying the development experience MetaSpore is uniformly calculated in offline feature For the Table class model sparse or dense MetaSpore takes Arrow Table as input offline and expresses the feature calculation logic as an execution plan of Arrow Compute An expression map evaluates each feature For discrete features hashing and multi value combinations are supported For dense features normalization binarization and bucket splitting are supported Much of this calculation can be done directly using Arrow s built in expression For Arrow which does not have built in computations you can register your own expressions using a custom Compute Kernel In terms of implementation the expression of feature operation used offline is serialized and saved in the directory exported by the model The same Arrow Compute expression is constructed after the online prediction service loads the expression so that the completely consistent feature calculation can be shared offline In addition because Arrow Compute supports filtering joins and other operations LakeSoul also supports input to input batches Thanks to the design of Arrow column expression these calculations are vectorized and can have relatively high performance The offline feature calculation and offline input calculation are unified Arrow Table format Arrow itself provides a multi language interface to create Arrow tables so MetaSpore Serving also easily supports multiple language calls For example with an XGBoost model with ten float type feature inputs LakeSoul implements a program in Python that calls the MetaSpore prediction service import grpcimport metaspore pbimport metaspore pb grpcimport pyarrow as pawith grpc insecure channel as channel stub metaspore pb grpc PredictStub channel Construct a one line feature sample row values for i in range row append pa array values i type pa float Create Arrow RecordBatch rb pa RecordBatch from arrays row f field i for i in range Serialization RecordBatch sink pa BufferOutputStream with pa ipc new file sink rb schema as writer writer write batch rb Construct GRPC request for MetaSpore Serving payload map input sink getvalue to pybytes request metaspore pb PredictRequest model name xgboost model payload payload map Call Serving service prediction reply stub Predict request for name in reply payload with pa BufferReader reply payload name as reader tensor pa ipc read tensor reader print f Predict Output Tensor tensor to numpy It can be seen that the online prediction is also the Arrow Table of the input characteristics and MetaSpore Serving can be directly invoked The invocation method is also uniform offline MetaSpore online algorithm application framework Offline training frameworks and online Serving services are now available Then an algorithm in the business scene landing is still a final step an online algorithm experiment In a service scenario to verify the validity of an algorithm model a baseline needs to be established and compared with the new algorithm model Therefore an online experimental framework is needed which can easily define algorithm experiments read online features and call model prediction services In addition multiple experiments can be traffic segmented to achieve ABTest effect comparison A configuration center is also needed to quickly carry out multiple experimental iterations which can dynamically load refresh experiments and cut flow configurations support hot loading of experimental parameters and various debugging and trace functions This link also directly determines whether the AI model can be finally implemented into practical business applications MetaSpore provides an online algorithm application framework that covers the entire flow of online experiments This framework is based on SpringBoot Spring Cloud and provides the following core functions Experiment Pipeline Developers can add experimental classes with simple annotations ExperimentAnnotation name rank widedeep Componentpublic class WideDeepRankExperiment implements BaseExperiment lt RecommendResult RecommendResult gt Override public void initialize Map lt String Object gt map SneakyThrows Override public RecommendResult run Context context RecommendResult recommendResult The Spring Cloud configuration center is then used to dynamically assemble multiple experimental flows including flow segmentation for horizontal experiments and sequence of vertical experiments Take the recommendation system as an example there are two experiments in the recall layer and two experiments in the sorting layer and the two layers are orthogonal so that the online experiment process can be assembled through the following configuration scene config scenes name guess you like layers name match normalLayerArgs experimentName match base ratio experimentName match multiple ratio name rank normalLayerArgs experimentName rank wideDeep ratio experimentName rank lightGBM ratio experiments layerName match experimentName match base extraExperimentArgs modelName match base matcherNames ItemCfMatcher layerName match experimentName match multiple extraExperimentArgs modelName match multiple matcherNames ItemCfMatcher SwingMatcher TwoTowersMatcher layerName rank experimentName rank wideDeep extraExperimentArgs modelName movie lens wdl ranker WideAndDeepRanker maxReservation layerName rank experimentName rank lightGBM extraExperimentArgs modelName lightgbm test model ranker LightGBMRanker maxReservation After the configuration and experimental class implementation are added to the SpringBoot project the framework will automatically create and initialize the objects of the experimental class and execute them by order of the experimental layer and automatically select an experiment to execute in each layer according to the tangential flow configuration The developer can freely define the input and output between experiments providing a high degree of flexibility At the same time this configuration file is placed in the configuration center supporting dynamic hot update modification of experimental configuration and cutting flow configuration which can take effect in real time Online feature extraction framework In online prediction we need to construct an Arrow Table of features However online features can come from multiple sources often with various databases caches and upstream services MetaSpore based on SpringBoot JPA encapsulates a feature extraction code generation framework that automatically generates database access code only after the developer defines the schema and data source of the feature MetaSpore provides a Client implementation of online Serving supporting SpringBoot and providing annotation injection to access Serving MetaSpore embraces the open source ecosystemMetaSpore itself is an open source project and MetaSpore uses a mix of mature open source ecosystems For example in offline training MetaSpore s offline training framework Bridges PySpark and PyTorch two popular frameworks in big data and deep learning seamlessly combine their ecology better to address the pain points of real business scenarios The online service is also fully integrated with gRPC SpringBoot and Spring Cloud allowing developers to obtain the best practices of landing big data intelligent applications in the existing standard technology stack MetaSpore s philosophy is to embrace the open source ecosystem fully For example for offline NLP large model training developers who have the need can still use the existing open source framework such as OneFlow to train the model MetaSpore can continue to provide the ability to incorporate pre training of the large model into multimodal model learning as well as online Serving of the NLP model making the NLP large model more inclusive The conclusionThis article details the design philosophy and core features of MetaSpore s one stop machine learning platform to help readers better understand MetaSpore Want to learn more about MetaSpore business scenario implementation Next week I will update the detailed steps of using MetaSpore to build industrial recommendation systems rapidly 2022-04-30 16:23:43
Apple AppleInsider - Frontpage News Bullstrap MagSafe wallet review: Same as Apple's but with better leather https://appleinsider.com/articles/22/04/30/bullstrap-magsafe-wallet-review-same-as-apples-but-with-better-leather?utm_medium=rss Bullstrap MagSafe wallet review Same as Apple x s but with better leatherBullstrap has made a name for itself with quality leather goods Its recent MagSafe wallet looks and feels incredible but offers no innovation beyond recreating Apple s original design Bullstrap s leather MagSafe wallet MagSafe debuted with the iPhone line and was subsequently brought to the iPhone lineup With MagSafe users can mount wallets to their phones attach their phones to stands connect a magnetic battery pack or charge from a power outlet Read more 2022-04-30 16:45:08
Apple AppleInsider - Frontpage News Apple clarifies conditions for App Store app removal, extends update deadline to 90 days https://appleinsider.com/articles/22/04/30/apple-clarifies-conditions-for-app-store-app-removal-extends-update-deadline-to-90-days?utm_medium=rss Apple clarifies conditions for App Store app removal extends update deadline to daysAfter a week of confusion and speculation Apple has attempted to address developers concerns about the company removing outdated apps from the App Store In mid April Apple had notified developers that any app that has not been updated for a significant amount of time would be pulled from the App Store However developers would be able to circumvent the removal by submitting an update for review within days Developers took to social media to speak up against the policy stating that the changes were unfair to indie developers Read more 2022-04-30 16:34:46
ニュース BBC News - Home Watford 1-2 Burnley: Two late goals lift Clarets five points clear of relegation zone https://www.bbc.co.uk/sport/football/61198625?at_medium=RSS&at_campaign=KARANGA Watford Burnley Two late goals lift Clarets five points clear of relegation zoneBurnley score twice in the final seven minutes to move five points clear of the relegation zone and leave fellow strugglers Watford on the brink of an immediate return to the Championship 2022-04-30 16:36:35
ニュース BBC News - Home Aston Villa 2-0 Norwich City: Canaries relegated from Premier League https://www.bbc.co.uk/sport/football/61198620?at_medium=RSS&at_campaign=KARANGA villa 2022-04-30 16:40:13

コメント

このブログの人気の投稿

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2020-12-01 09:41:49 RSSフィード2020-12-01 09:00 分まとめ(69件)