IT |
ITmedia 総合記事一覧 |
[ITmedia ビジネスオンライン] 利用者がおすすめするサブスク 3位「Apple Music」、2位と1位は? |
https://www.itmedia.co.jp/business/articles/2204/23/news029.html
|
amazonprimevideo |
2022-04-23 05:15:00 |
AWS |
AWS |
AWS Leadership - Rahul, VP of Data & Analytics | Amazon Web Services |
https://www.youtube.com/watch?v=WRC4glPF6Xs
|
AWS Leadership Rahul VP of Data amp Analytics Amazon Web ServicesAt AWS Leader is an action verb Expertise and Diversity are valued and Leaders succeed here Come see for yourself View open roles at AWS Learn about AWS culture Subscribe More AWS videos More AWS events videos ABOUT AWSAmazon Web Services AWS is the world s most comprehensive and broadly adopted cloud platform offering over fully featured services from data centers globally Millions of customers ーincluding the fastest growing startups largest enterprises and leading government agencies ーare using AWS to lower costs become more agile and innovate faster WorkingAtAWS AWSCareers AWS AmazonWebServices CloudComputing |
2022-04-22 20:51:50 |
AWS |
AWS |
AWS Leadership - Shafiq, Director, AWS Commercial Sales | Amazon Web Services |
https://www.youtube.com/watch?v=mVZuG-G0Jb4
|
AWS Leadership Shafiq Director AWS Commercial Sales Amazon Web ServicesAt AWS Leader is an action verb Expertise and Diversity are valued and Leaders succeed here Come see for yourself View open roles at AWS Learn about AWS culture Subscribe More AWS videos More AWS events videos ABOUT AWSAmazon Web Services AWS is the world s most comprehensive and broadly adopted cloud platform offering over fully featured services from data centers globally Millions of customers ーincluding the fastest growing startups largest enterprises and leading government agencies ーare using AWS to lower costs become more agile and innovate faster WorkingAtAWS AWSCareers AWS AmazonWebServices CloudComputing |
2022-04-22 20:51:41 |
golang |
Goタグが付けられた新着投稿 - Qiita |
【Golang】echoサーバーでcsvファイルをstructにする&structをcsvにする |
https://qiita.com/Sunochi/items/dc7463ad39482004b1be
|
github |
2022-04-23 05:11:06 |
海外TECH |
Ars Technica |
Our first impressions after driving FedEx’s new electric delivery van |
https://arstechnica.com/?p=1849985
|
delivery |
2022-04-22 20:40:06 |
海外TECH |
Ars Technica |
Report: Sonos will finally make a soundbar that’s almost affordable |
https://arstechnica.com/?p=1849791
|
channel |
2022-04-22 20:03:32 |
海外TECH |
MakeUseOf |
10 Great IFTTT Applets to Automate Your iPhone or Android Phone |
https://www.makeuseof.com/tag/ifttt-applets-automate-android/
|
device |
2022-04-22 20:45:14 |
海外TECH |
MakeUseOf |
Duolingo vs. Rosetta Stone: Which Language Learning App Is Better? |
https://www.makeuseof.com/duolingo-vs-rosetta-stone/
|
learning |
2022-04-22 20:45:13 |
海外TECH |
MakeUseOf |
How to Create Bookmark Folders in Brave, Chrome, and Edge |
https://www.makeuseof.com/create-bookmark-folders-brave-chrome-edge/
|
How to Create Bookmark Folders in Brave Chrome and EdgeBookmark folders are a great way to organize bookmarks even if they do add an extra few clicks Here s how to create them in Brave Chrome and Edge |
2022-04-22 20:30:13 |
海外TECH |
MakeUseOf |
How Instagram Is Going to Start Rewarding Original Content |
https://www.makeuseof.com/instagram-rewarding-original-content/
|
contentinstagram |
2022-04-22 20:15:14 |
海外TECH |
MakeUseOf |
How to Find All Running Processes Using WMIC in Windows 11 |
https://www.makeuseof.com/windows-11-wmic-find-running-processes/
|
windows |
2022-04-22 20:15:13 |
海外TECH |
DEV Community |
How web browsers work - part 1 (with illustrations) |
https://dev.to/arikaturika/how-web-browsers-work-part-1-with-illustrations-1nid
|
How web browsers work part with illustrations Browsers also called web browsers or an Internet browsers are software applications installed on our devices that allow us to access the Word Wide Web You are actually using one while reading this text There are many browsers in use today and as of the most used ones were Google Chrome Apple s Safari Microsoft Edge and Firefox But how do they actually work and what happens from the moment we type a web address into the address bar until the page we are trying to access gets displayed on our screen An over simplified verison of this would be that when we request a web page from a particular website the browser retrieves the necessary content from a web server and then displays the page on our device Pretty straight forward right Yes but there s more involved into this seemingly super simple process In this series we are going to talk about the navigation fetching the data parsing and rendering steps and hope to make these concepts clearer to you NAVIGATIONNavigation is the first step in loading a web page It refers to the process that happens when the user is requesting a web page either by clicking on a link writing a web address in the browser s address bar submitting a form etc DNS Lookup resolving the web address The first step in navigating to a web page is finding where the assets for that page are located HTML CSS Javascript and other kind of files If we navigate to the HTML page is located on the server with IP address of for us websites are domain names but for computers they are IP adresses If we ve never visited this site before a Domain Name System DNS lookup must happen DNS servers are computer servers that contain a database of public IP addresses and their associated hostnames this is commonly compared to a phonebook in that people s names are associated to a particular phone number In most cases these servers serve to resolve or translate those names to IP addresses as requested right now there are over different DNS root servers distributed across the world So when we request a DNS lookup what we actually do is interogate one of these servers and ask to find out which IP address coresponds to the name If a corresponding IP is found it is returned If something happens and the lookup is not successful we ll see some kind of error message in the browser After this initial lookup the IP address will probably be cached for a while so next visits on the same website will happen faster since there s no need for a DNS lookup remember a DNS lookup only happens the first time we visit a website TCP Transmission Control Protocol HandshakeOnce the web browser knows the IP address of the website it will try and set up a connection to the server via a TCP three way handshake also called SYN SYN ACK or more accurately SYN SYN ACK ACK because there are three messages transmitted by TCP to negotiate and start a TCP session between two computers TCP stands for Transmission Control Protocol a communications standard that enables application programs and computing devices to exchange messages over a network It is designed to send packets of data across the Internet and ensure the successful delivery of data and messages over networks The TCP Handshake is a mechanism designed so that two entities in our case the browser and the server that want to pass information back and forth to each other can negotiate the parameters of the connection before transmitting data So if the browser and the server would be two people the conversation between them would go something like The browser sends a SYNC message to the server and asks for SYNchronization synchronization means the connection The server will then reply with a SYNC ACK message SYNChronization and ACKnowledgement In the last step the browser will reply with an ACK message Now that the TCP connection a two way connection has been established through the way handshake the TLS negotiation can begin TLS negotiationFor secure connections established over HTTPS anotherhandshake is needed This handshake TLS negotiation determines which cipher will be used to encrypt the communication verifies the server and establishes that a secure connection is in place before beginning the actual transfer of data Transport Layer Security TLS the successor of the now deprecated Secure Sockets Layer SSL is a cryptographic protocol designed to provide communications security over a computer network The protocol is widely used in applications such as email and instant messaging but its use in securing HTTPS remains the most publicly visible Since applications can communicate either with or without TLS or SSL it is necessary for the client browser to request that the server sets up a TLS connection During this step some more message are exchanged between the browser and the server and things usually go as it follows Client says hello The browser sends the server a message that includes which TLS version the it supports and the cipher suites supporte and a string of random bytes known as the client random Server hello message and certificate The server send a message back containing the server s SSL certificate the server s chosen cipher suite and the server random another random string of bytes that s generated by the server Authentication The browser verifies the server s SSL certificate with the certificate authority that issued it This way the browser can be sure that the server is who it says it is The premaster secret The broswer sends one more random string of bytes called the premaster secret which is encrypted with a public key which the browser takes from the SSL certificate from the server The premaster secret can only be decripted with the private key by the server Private key used The server decrypts the premaster secret Session keys created The browser and server generate session keys from the client random the server random and the premaster secret Client finished The browser sends a message to the server saying it has finished Server finished The server sends a message to the browser saying it has also finished Secure symmetric encryption achieved The handshake is completed and communication can continue using the session keys Now requesting and receiving data from the server can begin Image source Taras Shypka bugsster on Unsplash |
2022-04-22 20:45:24 |
海外TECH |
DEV Community |
Apache Spark, Hive, and Spring Boot — Testing Guide |
https://dev.to/kirekov/apache-spark-hive-and-spring-boot-testing-guide-mdp
|
Apache Spark Hive and Spring Boot ーTesting GuideBig Data is trending The companies have to operate with a huge amount of data to compete with others For example this information is used to show you the relevant advertisements and recommend you the services that you may find interesting The problem with Big Data software systems is their complexity Testing becomes tough How could you verify the app behaviour locally when it s tuned to connect to the HDFS cluster In this article I m showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database More than that I m giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution The code examples are taken from this repository Firstly let s get over some basic concepts of the Big Data stack we re using Don t worry it won t take long But it s necessary to understand the core idea Basics of HDFSHDFS Hadoop Distributed File System is a distributed file system designed to run on many physical servers So a file in HDFS is an abstraction that hides the complexity of storing and replicating the data between multiple nodes Why do we need HDFS There are some reasons Hardware FailuresHard disk drives crash That s the reality we have to deal with If a file is split between multiple nodes individual failures won t affect the whole data Besides data is replicated in HDFS So even after a disk crash the information can be restored from the other sources Really Large FilesHDFS allows building of a network of not so powerful machines into a huge system For example if you have nodes with TB disk storage on each one then you possess TB of HDFS space If the replication factor equals it s possible to store a single file with a size of TB Not to mention that lots of local file systems do not support so large files even if you have the available disk space The Speed of ReadingIf you read the file sequentially it will take you N But if the file is split into chunks between nodes you can get its content in N time Because each node can read the chunk in parallel So HDFS is not only about safety It s about swiftness We have omitted the time spend on network communications But if files are huge this part is just a fraction Basics of Apache HiveApache Hive is the database facility running over HDFS It allows querying data with HQL SQL like language Regular databases e g PostgreSQL Oracle act as an abstraction layer over the local file system While Apache Hive acts as an abstraction over HDFS That s it Basics of Apache SparkApache Spark is a platform for operating and transforming huge amounts of data The key idea is that Apache Spark workers run on multiple nodes and store the intermediate results in RAM It s written in Scala but it also supports Java and Python Take a look at the schema below It s the common representation of the Apache Spark batch job Apache Spark loads data from Data Producer proceeds some operations on it and puts the result to Data Consumer in our case Apache Hive is data producer and Aerospike is data consumer Apache Spark application is a regular jar file that contains the transformation logic Take a look at the example below JavaRDD lt String gt textFile sc textFile hdfs raw data txt JavaPairRDD lt String Integer gt counts textFile flatMap s gt Arrays asList s split iterator mapToPair word gt new Tuple lt gt word reduceByKey a b gt a b counts saveAsTextFile hdfs words count txt It s a simple word count application Firstly we load the content of the raw data txt HDFS file Then we split each line by assign for every word and reduce the result by words to summarize the whole numbers Then the obtained pairs are saved to word count txt The flow is similar to Java Stream API The difference is that every lambda expression is executed on the workers So Spark transfers the code to the remote machines performs the calculation and returns the obtained results If we owe a sufficient number of workers we can proceed with the amount of data that is measured by terabytes or even zettabytes The Apache Spark approach of delivering code to data has some drawbacks We ll discuss it when we get to the development Another important aspect is laziness Just like Stream API Apache Spark does not start any calculations until terminal operation invocation In this case reduceByKey is the one The rest operations build the pipeline rules but do not trigger anything Build ConfigurationLet s start the development process Firstly we need to choose the Java version At the moment of writing the latest stable Apache Spark release is It supports Java So we gonna use it Currently Apache Spark does not support Java Make sure you don t use it for running integration tests Otherwise you ll get bizarre error messages The project is bootstrapped with Spring Initializr Nothing special here But the dependencies list should be clarified Dependencies Resolutionext set testcontainersVersion set sparkVersion set slfjVersion set aerospikeVersion dependencies annotationProcessor org springframework boot spring boot configuration processor implementation org springframework boot spring boot starter validation exclude group org slfj implementation com aerospike aerospike client aerospikeVersion exclude group org slfj compileOnly org apache spark spark core sparkVersion compileOnly org apache spark spark hive sparkVersion compileOnly org apache spark spark sql sparkVersion compileOnly org slfj slfj api slfjVersion testImplementation org apache derby derby testImplementation org apache spark spark core sparkVersion testImplementation org apache spark spark hive sparkVersion testImplementation org apache spark spark sql sparkVersion testImplementation org springframework boot spring boot starter test testImplementation org slfj slfj api slfjVersion testImplementation org codehaus janino janino testImplementation org testcontainers junit jupiter testImplementation org awaitility awaitility testImplementation org hamcrest hamcrest all Core DependenciesFirst comes Apache Spark dependencies The spark core artefact is the root The spark hive enables data retrieving from Apache Hive And the spark sql dependency gives us the ability to query data from Apache Hive with SQL usage Note that all the artefacts have to share the same version in our case it is As a matter of fact the Apache Spark dependencies version should match the one that runs the production cluster in your company All Spark dependencies have to be marked as compileOnly It means that they won t be included in the assembled jar file Apache Spark will provide the required dependencies in runtime If you include them as implementation scope that may lead to hard tracking bugs during execution Then we have aerospike client dependency You have probably noticed that the org slfj group is excluded everywhere and included as a compileOnly dependency as well We ll talk about this later when we get to the Apache Spark logging facility Test DependenciesAnd finally here comes test scoped artefacts Apache Spark ones are included as testImplementation Because integration tests will start the local Spark node So they are required during the runtime The slfj api is also the runtime dependency Testcontainers will be used to run the Aerospike instance The janino is required by Apache Spark during the job execution And we need Apache Derby to tune Apache Hive for local running We ll get to this point soon Logging ConfigurationApache Spark applies logj with the slfj wrapper But the default Spring Boot logger is logback This setup leads to exceptions during Spring context initializing due to multiple logging facilities present in the classpath The easiest way to solve it is to exclude all auto configured Spring Boot logging features That s not a big deal Anyway Apache Spark provides its own slfj implementation during the runtime So we just need to include this dependency as compileOnly That is sufficient Excluding logback from the Spring Boot project is easy with Gradle Take a look at the example below configurations compileOnly extendsFrom annotationProcessor all exclude group org springframework boot module spring boot starter logging exclude group org springframework boot module snakeyaml Possible application yml issuesThe snakeyml exclusion requires special attention Spring Boot uses the library to parse properties from yml files i e application yml Some Apache Spark versions use the same library for internal operations The thing is that the versions required by Spring Boot and Apache Spark differ If you exclude it from Spring Boot dependency and rely on the one provided by Apache Spark you will face the NoSuchMethodError Spring Boot invokes the method that is absent in the version provided by Apache Spark So I would recommend sticking with the properties format and removing Spring Boot YAML auto configuration That will help you to avoid unnecessary difficulties Take a look at the code example below SpringBootApplication exclude GsonAutoConfiguration class public class SparkBatchJobApplication public static void main String args SpringApplication run SparkBatchJobApplication class args Fat JarThe result jar is going to submitted to Apache Spark cluster e g spark submit command So it should contain all runtime artefacts Unfortunately the standard Spring Boot packaging does not put the dependencies in the way Apache Spark expects it So we ll use shadow jar Gradle plugin Take a look at the example below plugins id org springframework boot version id io spring dependency management version RELEASE id java id com github johnrengelman shadow version shadowJar zip true mergeServiceFiles append META INF spring handlers append META INF spring schemas append META INF spring tooling transform PropertiesFileTransformer paths META INF spring factories mergeStrategy append Now we can run all tests and build the artefact with the gradlew test shadowJar command Starting DevelopmentNow we can get to the development process Apache Spark ConfigurationWe need to declare JavaSparkContext and SparkSession The first one is the core Apache Spark for all operations Whilst SparkSession is the part of spark sql projects It allows us to query data with SQL which is quite handy for Apache Hive Take a look at the Spring configuration below Configurationpublic class SparkConfig Value spring application name private String appName Bean Profile LOCAL public SparkConf localSparkConf throws IOException final var localHivePath Files createTempDirectory hiveDataWarehouse FileSystemUtils deleteRecursively localHivePath return new SparkConf setAppName appName setMaster local set javax jdo option ConnectionURL jdbc derby memory local create true set javax jdo option ConnectionDriverName org apache derby jdbc EmbeddedDriver set hive stats jdbc timeout set spark ui enabled false set spark sql session timeZone UTC set spark sql catalogImplementation hive set spark sql warehouse dir localHivePath toAbsolutePath toString Bean Profile PROD public SparkConf prodSparkConf return new SparkConf setAppName appName Bean public JavaSparkContext javaSparkContext SparkConf sparkConf return new JavaSparkContext sparkConf Bean public SparkSession sparkSession JavaSparkContext sparkContext return SparkSession builder sparkContext sparkContext sc config sparkContext getConf enableHiveSupport getOrCreate SparkConf defines configuration keys for the Apache Spark job As you have noticed there are two beans for different Spring profiles LOCAL is used for integration testing and PROD is applied in the production environment The PROD configuration does not declare any properties because usually they are passed as command line arguments in the spark submit shell script On the contrary the LOCAL profile defines a set of default properties required for proper running Here are the most important ones setMaster local tells Apache Spark to start a single local node javax jdo option ConnectionURL and javax jdo option ConnectionDriverName declare the JDBC connection for Apache Hive meta storage That s why we added Apache Derby as the project dependencyspark sql catalogImplementation means that local files shall be stored in the Apache Hive compatible format spark sql warehouse dir is the directory for storing Apache Hive data Here we re using temporary directory JavaSparkContext accepts the defined SparkConf as the constructor arguments Meanwhile SparkSession wraps the existing JavaSparkContext Note that Apache Hive support should be enabled manually enableHiveSupport Creating Apache Hive TablesWhen we submit an application to the production Apache Spark cluster we probably won t need to create any Apache Hive tables Most likely the tables have already been created by someone else And our goal is to select rows and transfer the data to another storage But when we run integration tests locally or in the CI environment there are no tables by default So we need to create them somehow In this project we re working with one table media subscriber info It consists of two columns MSISDN phone number and some subscriber ID Before each test run we have to delete previous data and add new rows to ensure verifying rules consistency The easiest way to achieve it is to declare scripts for table creation and dropping We ll keep them in the resources directory Take a look at the structure below V media hqlCreates media database if it s absent create database if not exists mediaV media subscriber info hqlCreates subscriber info table if it s absent create table if not exists media subscriber info subscriber id string msisdn string row format delimitedfields terminated by lines terminated by n stored as textfileDROP V mediatv dds subscriber info hqlDrops the subscriber info table drop table if exists media subscriber infoV N prefixes are not obligatory I put them to ensure that each new table script will be executed as the last one It is helpful to make tests work deterministically OK now we need a handler to process those HQL queries Take a look at the example below Component Profile LOCAL public class InitHive private final SparkSession session private final ApplicationContext applicationContext public void createTables executeSQLScripts getResources applicationContext classpath hive ddl create hql public void dropTables executeSQLScripts getResources applicationContext classpath hive ddl drop hql private void executeSQLScripts Resource resources for Resource resource resources session sql readContent resource The first thing to notice is Profile LOCAL usage Because we don t need to create or drop tables in the production environment The createTables and dropTables methods provide the list of resources containing the required queries getResources is the utility function that reads files from the classpath You can discover the implementation here So now we re ready to write the business code Business Code FacadeThe core interface is EnricherServicepublic interface EnricherService void proceedEnrichment We re expecting that it might have many implementations Each one represent a step in whole batch process Then we have EnricherServiceFacade that encapsulates all implementations of EnricherService and run them one by one Servicepublic class EnricherServiceFacade private final List lt EnricherService gt enricherServices public void proceedEnrichment List lt EnrichmentFailedException gt errors new ArrayList lt gt for EnricherService service enricherServices try service proceedEnrichment catch Exception e errors add new EnrichmentFailedException Unexpected error during enrichment processing e if errors isEmpty throw new EnrichmentFailedException errors We re trying to run every provided enrichment step If any of them fails we throw the exception that combines all errors into a solid piece Finally we need to tell Spring to execute EnricherServiceFacade proceedEnrichment on application startup We could add it directly to the main method but it s not the Spring way Therefore it makes testing harder The better option is EventListener Component Profile PROD public class MainListener private final EnricherServiceFacade enricherServiceFacade EventListener public void proceedEnrichment ContextRefreshedEvent event final long startNano System nanoTime LOG info Starting enrichment process try enricherServiceFacade proceedEnrichment LOG info Enrichment has finished successfully It took Duration ofNanos System nanoTime startNano catch Exception e String err Enrichment has finished with error It took Duration ofNanos System nanoTime startNano LOG error err e throw new EnrichmentFailedException err e The proceedEnrichment method is being invoked when the Spring context is started By the way only the active PROD profile will trigger the job EnricherService ImplementationWe re going to deal with a single EnricherService implementation It simply selects all rows from the media subcriber info table and puts the result in the Aerospike database Take a look at the code snippet below Servicepublic class SubscriberIdEnricherService implements EnricherService Serializable private static final long serialVersionUID L private final SparkSession session private final AerospikeProperties aerospikeProperties Override public void proceedEnrichment Dataset lt Row gt dataset session sql SELECT subscriber id msisdn FROM media subscriber info WHERE msisdn IS NOT NULL AND subscriber id IS NOT NULL dataset foreachPartition iterator gt final var aerospikeClient newAerospikeClient aerospikeProperties iterator forEachRemaining row gt String subscriberId row getAs subscriber id String msisdn row getAs msisdn Key key new Key my namespace huawei subscriberId Bin bin new Bin msisdn msisdn try aerospikeClient put null key bin LOG info Record has been successfully added key catch Exception e LOG error Fail during inserting record to Aerospike e There are multiple points that has to be clarified SerializationApache Spark applies a standard Java serialization mechanism So any dependencies used inside lambdas map filter groupBy forEach etc have to implement the Serializable interface Otherwise you ll get the NotSerializableException during the runtime We have a reference to AerospikeProperties inside the foreachPartition callback Therefore this class and the SubscriberIdEnricherService itself should be allowed for serializing because the latter one keeps AerospikeProperties as a field If a dependency is not used within any Apache Spark lambda you can mark it as transient And finally the serialVersionUID manual assignment is crucial The reason is that Apache Spark might serialize and deserialize the passed objects multiple times And there is no guarantee that each time auto generated serialVersionUID will be the same It can be a reason for hard tracking floating bugs To prevent this you should declare serialVersionUID by yourself The even better approach is to force the compiler to validate the serialVersionUID field presence on any Serializable classes In this case you need to mark Xlint serial warning as an error Take a look at the Gradle example tasks withType JavaCompile options compilerArgs lt lt Xlint serial lt lt Werror Aerospike Client InstantiationUnfortunately the Java Aerospike client does not implement the Serializable interface So we have to instantiate it inside the lambda expression In that case the object will be created on a worker node directly It makes serialization redundant I should admit that Aerospike provides Aerospike Connect Framework that allows transferring data via Apache Spark in a declarative way without creating any Java clients Anyway if you want to use it you have to install the packed library to the Apache Spark cluster directly There is no guarantee that you ll have such an opportunity in your situation So I m omitting this scenario PartitioningThe Dataset class has the foreach method that simply executes the given lambda for each present row However if you initialize some heavy resource inside that callback e g database connection the new one will be created for every row in some cases there might billions of rows Not very efficient isn t it The foreachPartition method works a bit differently Apache Spark executes it once per the Dataset partition It also accepts Iterator lt Row gt as an argument So inside the lambda we can initialize heavy resources e g AerospikeClient and apply them for calculations of every Row inside the iterator The partition size is calculated automatically based on the input source and Apache Spark cluster configuration Though you can set it manually by calling the repartition method Anyway it is out of the scope of the article Testing Aerospike SetupOK we ve written some business code How do we test it Firstly let s declare Aerospike setup for Testcontainers Take a look at the code snippet below ContextConfiguration initializers IntegrationSuite Initializer class public class IntegrationSuite private static final String AEROSPIKE IMAGE aerospike aerospike server static class Initializer implements ApplicationContextInitializer lt ConfigurableApplicationContext gt static final GenericContainer lt gt aerospike new GenericContainer lt gt DockerImageName parse AEROSPIKE IMAGE withExposedPorts withEnv NAMESPACE my namespace withEnv SERVICE PORT waitingFor Wait forLogMessage migrations complete Override public void initialize ConfigurableApplicationContext applicationContext startContainers aerospike followOutput new SlfjLogConsumer LoggerFactory getLogger Aerospike ConfigurableEnvironment environment applicationContext getEnvironment MapPropertySource testcontainers new MapPropertySource testcontainers createConnectionConfiguration environment getPropertySources addFirst testcontainers private static void startContainers Startables deepStart Stream of aerospike join private static Map lt String Object gt createConnectionConfiguration return Map of aerospike hosts Stream of map port gt aerospike getHost aerospike getMappedPort port collect Collectors joining The IntegrationSuite class is used as the parent for all integration tests The IntegrationSuite Initializer inner class is used as the Spring context initializer The framework calls it when all properties and bean definitions are already loaded but no beans have been created yet It allows us to override some properties during the runtime We declare the Aerospike container as GenericContainer because the library does not provide out of box support for the database Then inside the initialize method we retrieve the container s host and port and assign them to the aerospike hosts property Apache Hive UtilitiesBefore each test method we are suppose to delete all data from Apache Hive and add new rows required for the current scenario So tests won t affect each other Let s declare a custom test facade for Apache Hive Take a look at the code snippet below TestComponentpublic class TestHiveUtils Autowired private SparkSession sparkSession Autowired private InitHive initHive public void cleanHive initHive dropTables initHive createTables public lt T E extends HiveTable lt T gt gt E insertInto Function lt SparkSession E gt tableFunction return tableFunction apply sparkSession There are just two methods The cleanHive drops all existing and creates them again Therefore all previous data is erased The insertInto is tricky It serves the purpose of inserting new rows to Apache Hive in a statically typed way How is that done First of all let s inspect the HiveTable lt T gt interface public interface HiveTable lt T gt void values T t As you see it s a regular Java functional interface Though the implementations are not so obvious public class SubscriberInfo implements HiveTable lt SubscriberInfo Values gt private final SparkSession session public static Function lt SparkSession SubscriberInfo gt subscriberInfo return SubscriberInfo new Override public void values Values values for Values value values session sql format insert into s values s s media subscriber info value subscriberId value msisdn public static class Values private String subscriberId private String msisdn public Values setSubscriberId String subscriberId this subscriberId subscriberId return this public Values setMsisdn String msisdn this msisdn msisdn return this The class accepts SparkSession as a constructor dependency The SubscriberInfo Values are the generic argument The class represents the data structure containing values to insert And finally the values implementation performs the actual new row creation The key is the subscriberInfo static method What s the reason to return Function lt SparkSession SubscriberInfo gt Its combination with TestHiveUtils insertInto provides us with statically typed INSERT INTO statement Take a look at the code example below hive insertInto subscriberInfo values new SubscriberInfo Values setMsisdn msisdn setSubscriberId subscriberId new SubscriberInfo Values setMsisdn msisdn setSubscriberId subscriberId An elegant solution don t you think Spark Integration Test SliceSpring integration tests require a specific configuration It s wise to declare it once and reuse it Take a look at the code snippet below SpringBootTest classes SparkConfig class SparkContextDestroyer class AerospikeConfig class PropertiesConfig class InitHive class TestHiveUtils class TestAerospikeFacade class EnricherServiceTestConfiguration class ActiveProfiles LOCAL public class SparkIntegrationSuite extends IntegrationSuite Inside the SpringBootTest we have listed all the beans that are used during tests running TestAerospikeFacade is just a thin wrapper around the Java Aerospike client for test purposes Its implementation is rather straightforward but you can check out the source code by this link The EnricherServiceTestConfiguration is the Spring configuration declaring all implementations for the EnricherService interface Take a look at the example below TestConfigurationpublic class EnricherServiceTestConfiguration Bean public EnricherService subscriberEnricherService SparkSession session AerospikeProperties aerospikeProperties return new SubscriberIdEnricherService session aerospikeProperties I want to point out that all EnricherService implementations should be listed inside the class If we apply different configurations for each test suite the Spring context will be reloaded Mostly that s not a problem But Apache Spark usage brings obstacles You see when JavaSparkContext is created it starts the local Apache Spark node But when we instantiate it twice during the application lifecycle it will result in an exception The easiest way to overcome the issue is to make sure that JavaSparkContext will be created only once Now we can get to the testing process Integration Test ExampleHere is a simple integration test that inserts two rows to Apache Spark and checks that the corresponding two records are created in Aerospike within seconds Take look at the code snippet below class SubscriberIdEnricherServiceIntegrationTest extends SparkIntegrationSuite Autowired private TestHiveUtils hive Autowired private TestAerospikeFacade aerospike Autowired private EnricherService subscriberEnricherService BeforeEach void beforeEach aerospike deleteAll my namespace hive cleanHive Test void shouldSaveRecords hive insertInto subscriberInfo values new SubscriberInfo Values setMsisdn msisdn setSubscriberId subscriberId new SubscriberInfo Values setMsisdn msisdn setSubscriberId subscriberId subscriberEnricherService proceedEnrichment List lt KeyRecord gt keyRecords await atMost TEN SECONDS until gt aerospike scanAll my namespace hasSize assertThat keyRecords allOf hasRecord subscriberId msisdn hasRecord subscriberId msisdn If you tune everything correctly the test will pass The whole test source is available by this link ConclusionThat s basically all I wanted to tell you about testing Apache Hive Apache Spark and Aerospike integration with Spring Boot usage As you can see the Big Data world is not so complicated after all All code examples are taken from this repository You can clone it and play around with tests by yourself If you have any questions or suggestions please leave your comments down below Thanks for reading ResourcesRepository with examplesHDFS Hadoop Distributed File System Apache HiveApache SparkApache DerbyAerospike DatabaseAerospike Connect FrameworkJava Stream APISpring InitializrSpring profilesTestcontainersGradle plugin shadow jar |
2022-04-22 20:22:58 |
海外TECH |
DEV Community |
14 Exemplos de código limpo e encurtamento de código Javascript |
https://dev.to/stefanyrepetcki/14-exemplos-de-codigo-limpo-e-encurtamento-de-codigo-javascript-1m3f
|
Exemplos de código limpo e encurtamento de código Javascript Obtendo vários elementos do DOM const a document getElementById a const b document getElementById b const c document getElementById c const d document getElementById d const a b c d a b c d map document getElementById Exportando várias variáveisexport const foo export const bar export const kip export const foo bar kip Atribuir um valor àmesma coisa condicionalmente usando operadores ternários a gt b foo maça foo banana foo a gt b maça banana Atribuir condicionalmente o mesmo valor a uma propriedade de objeto específica c gt d a foo maça a bar maça const key c gt d foo bar const a key maça Declarar e atribuir variáveis das propriedades do objeto const a foo x b foo y const x a y b foo Declarar e atribuir variáveis de índices de array let a foo b foo let a b foo Use operadores lógicos para condicionais simples Link útil if foo facaAlgo OUfoo amp amp facaAlgo Passando parâmetros condicionalmente if foo foo maça bar foo kip bar foo maça kip Lidar com muitos s const SALARIO const TAXA const SALARIO const TAXA ouconst SALARIO e const TAXA e Atribuindo a mesma coisa a várias variáveis a d b d c d a b c d Simplifique a lógica de uma condição if fruta if banana return comer banana if fruta amp amp banana return comer banana Devolução Antecipada Não utilize a palavra else Uma das regras do clean code if fruta maca else banana if fruta return maca return banana Os métodos de encadeamento devem retornar o mesmo tipo ou o mesmo contexto pessoa trim retorna string Aceitávelpessoa trim toUpperCase retorna string Ainda aceitávelpessoa trim toUpperCase replace STEFANY username return string Ainda aceitávelpessoa trim toUpperCase replace STEFANY username split array de retorno violou a regra de um ponto por linha Não abrevie nome de variáveislet i const n stefany let idade const nome stefany Espero que tenha ajudado Deixo aqui meu linkedin e github ️ |
2022-04-22 20:12:57 |
海外TECH |
DEV Community |
How to create a list of suggestions for your HTML Input field |
https://dev.to/senadmeskin/how-to-create-a-list-of-suggestions-for-your-html-input-field-5cnl
|
How to create a list of suggestions for your HTML Input fieldIf you want to have a predefined list of suggestions for your input field by using just simple HTML then lt datalist gt HTML element is for you We will create a list of cities that will be offered as a suggestion for our input field To create a list we will use the HTML element lt datalist gt that will have a list of items and id which will be used as a reference to our list lt datalist id listOfCities gt lt option value Bugojno gt lt option gt lt option value New York gt lt option gt lt option value London gt lt option gt lt option value Peking gt lt option gt lt datalist gt Now we will create an input field and connect it to our list with list attribute lt input type text id city list listOfCities gt Now when we start typing in our input field suggestions will be loaded CODEPEN |
2022-04-22 20:02:02 |
海外TECH |
DEV Community |
Digging into Postgresql and DEV |
https://dev.to/devteam/digging-into-postgresql-and-dev-3e43
|
Digging into Postgresql and DEVEarlier today I was reviewing the draft Community Wellness badge pull request and with my head deep in SQL these days I thought I d give a go at crafting a query to create this logic The following query finds the user IDs and weeks since today in which a user has written at least two comments that don t have a negative moderator reaction user id The user s database ID weeks ago The number of weeks since today in which we re grouping comments number of comments with positive reaction How many positive reaction comments did they have for the weeks ago SELECT user id COUNT user id as number of comments with positive reaction Get the number of weeks since today for posts trunc extract epoch FROM current timestamp created at AS weeks agoFROM comments Only select comments from the last weeks that don t have a negative moderator reaction INNER JOIN Find all comments in the last weeks SELECT DISTINCT reactable id FROM reactions WHERE reactable type Comment AND created at gt now interval day Omit any comments that got a negative moderator reaction EXCEPT SELECT DISTINCT reactable id FROM reactions WHERE reactable type Comment AND created at gt now interval day AND category IN thumbsdown vomit AS positve reactions ON comments id positve reactions reactable idINNER JOIN Find the users who have at least two comments in the last week SELECT count id AS number of comments user id AS comment counts user id FROM comments WHERE created at gt now interval day GROUP BY user id AS comment counts ON comments user id comment counts user id AND comment counts number of comments gt Don t select anything older than days ago or weeks ago WHERE created at gt now interval day GROUP BY user id weeks agoThe above query creates multiple rows per user id Which is fine but if you want to loop through things you ll need to bust out some temporary variable magic I was wondering if I d be able to get this down to one query With the help of some folks at Forem I wrote the following query aggregates that information for you but you need to do some assembly work The columns are user id The user s database ID serialized weeks ago A comma separated list of the weeks in which we had comments weeks ago array An array of integers that is the non string representation of serialized weeks ago we want to see how ActiveRecord handles this array of integers It s a the simpler version of the serialized weeks ago serialized comment counts A comma separated list of the number of comments The first number of the serialized weeks ago maps to the first number of the serialized comment counts And you get one row per user SELECT user id A comma separated string of weeks ago array to string array agg weeks ago AS serialized weeks ago Will active record convert this to an array of integers array agg weeks ago AS weeks ago array A comma separated string of comment counts The first value in this string happens on the week that is the first value in serialized weeks ago array to string array agg number of comments with positive reaction AS serialized comment countsFROM This is the same query as the first example query SELECT user id COUNT user id as number of comments with positive reaction Get the number of weeks since today for posts trunc extract epoch FROM current timestamp created at AS weeks agoFROM comments Only select comments from the last weeks that don t have a negative moderator reaction INNER JOIN Find all comments in the last weeks SELECT DISTINCT reactable id FROM reactions WHERE reactable type Comment AND created at gt now interval day Omit any comments that got a negative moderator reaction EXCEPT SELECT DISTINCT reactable id FROM reactions WHERE reactable type Comment AND created at gt now interval day AND category IN thumbsdown vomit AS positve reactions ON comments id positve reactions reactable idINNER JOIN Find the users who have at least two comments in the last week SELECT count id AS number of comments user id AS comment counts user id FROM comments WHERE created at gt now interval day GROUP BY user id AS comment counts ON comments user id comment counts user id AND comment counts number of comments gt Don t select anything older than days ago or weeks ago WHERE created at gt now interval day GROUP BY user id weeks ago AS user comment counts by week GROUP BY user idI am eager to share these Postgresql approaches as they can help circumvent running lots of smaller queries I also had the chance to pair up with two folks to make sure we wrote the correct logic and it was performant enough |
2022-04-22 20:01:30 |
海外TECH |
Engadget |
Judge dismisses most claims in Sony gender discrimination lawsuit |
https://www.engadget.com/sony-playstation-gender-discrimination-lawsuit-claims-dismissed-204003968.html?src=rss
|
Judge dismisses most claims in Sony gender discrimination lawsuitA gender discrimination lawsuit against Sony has run into significant hurdles Axios has learned that judge Laurel Beeler dismissed of plaintiff Emma Majo s claims due to multiple issues Majo didn t provide enough evidence to make a case in some instances Beeler said while in others she incorrectly asserted that promotions and demotions constituted harassment Majo first sued Sony in November over allegations of institutional discrimination The former PlayStation security analyst accused Sony of firing her for discussing sexism she reportedly encountered at the company Sony tried to have the suit tossed out due to both vague details and a lack of corroborating claims but the case gathered momentum in March when eight other women joined in and raised the potential for class action status The judge will still allow three claims surrounding wrongful termination and violations of whistleblower protections however and she rejected Sony s attempt to block any chance of class action status As the other claims were dismissed without prejudice Majo is free to revisit them if and when she can better support them Sony denied Majo s discrimination allegations but it also said in March that it would take the women s complaints quot seriously quot As it stands the partial dismissal clearly isn t what the company wanted ーit still has to face potentially grave implications and may be pressured to join companies like Activision Blizzard in reforming its internal culture |
2022-04-22 20:40:03 |
海外TECH |
Engadget |
Tesla can now insure your EV in Colorado, Oregon and Virginia |
https://www.engadget.com/tesla-insurance-colorado-oregon-virginia-200217835.html?src=rss
|
Tesla can now insure your EV in Colorado Oregon and VirginiaTesla s in house insurance is now available in three more states As Forbesnotes Tesla revealed during its latest earnings call that its quot real time quot insurance has reached Colorado Oregon and Virginia The automaker has also filed paperwork in Nevada with plans to offer insurance as early as June although nothing has been announced so far As in some other states the insurance determines your premiums based on driving behavior rather than standard criteria like age and credit Tesla examines the safety scores from its EVs and looks for signs of aggressive habits that might lead to incidents such as collision warnings hard braking and tailgating This rewards better driving ーand of course keeps you buying Tesla vehicles The company eventually plans to offer insurance across the entire US Whether or not that goes smoothly is unclear Tesla offers insurance in California but it s still seeking permission to use real time info It could be a while before the insurance and its signature feature are consistently available |
2022-04-22 20:02:17 |
Cisco |
Cisco Blog |
5 Eco-Friendly Tips for a Greener Learning Journey |
https://blogs.cisco.com/learning/5-eco-friendly-tips-for-a-greener-learning-journey
|
certification |
2022-04-22 20:31:23 |
ニュース |
BBC News - Home |
Ukraine round-up: Russia admits Moskva ship losses for first time |
https://www.bbc.co.uk/news/world-europe-61193787?at_medium=RSS&at_campaign=KARANGA
|
ukraine |
2022-04-22 20:31:12 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
東レ・日覺社長が語っていた「時流に迎合しない」現場主義の功罪 - 東レの背信 |
https://diamond.jp/articles/-/301646
|
執行役員制 |
2022-04-23 05:25:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
【4/26開催】メルカリCEOも登壇!最注目スタートアップ20社を知るオンラインイベント - ダイヤモンド社からのお知らせ |
https://diamond.jp/articles/-/302224
|
|
2022-04-23 05:22:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
上司に“昔の武勇伝”を自慢されたとき、どう反応するのが正しいか? - 「超」戦略的に聴く技術 |
https://diamond.jp/articles/-/301674
|
食べるラー油 |
2022-04-23 05:20:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
1ドル=129円突破の「超円安」の今、投資家が絶対やってはいけないこと - 「今」絶対やってはいけないお金の話 |
https://diamond.jp/articles/-/302229
|
為替市場 |
2022-04-23 05:17:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
源義経は評価されすぎ?その陰に隠れて功績が知られていない「ある人物」【歴史・見逃し配信】 - 見逃し配信 |
https://diamond.jp/articles/-/302182
|
配信 |
2022-04-23 05:15:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
JR九州の豪華観光列車「ななつ星」とスタバに共通する発想とは? - 事例で学ぶ「ビジネスモデルと戦略」講座 |
https://diamond.jp/articles/-/302181
|
価格設定 |
2022-04-23 05:10:00 |
ビジネス |
ダイヤモンド・オンライン - 新着記事 |
京都の神社で「春限定」御朱印ゲット!ビジネス成功&試験合格のご利益も - 地球の歩き方ニュース&レポート |
https://diamond.jp/articles/-/301999
|
神社に祀られているのは、恋愛の神様、金運の神様、健康や長寿の神様、学業成就の神様など、実にさまざま。 |
2022-04-23 05:05:00 |
北海道 |
北海道新聞 |
人格巡る発言で揺れた室蘭、痛み分けで幕引きへ トップ会談「不毛なやりとりなくなる」 市議「人間として最低」/市長、ツイッターで「心傷つく」 |
https://www.hokkaido-np.co.jp/article/672993/
|
痛み分け |
2022-04-23 05:18:09 |
北海道 |
北海道新聞 |
<社説>近づく大型連休 コロナ警戒を怠らずに |
https://www.hokkaido-np.co.jp/article/673036/
|
大型連休 |
2022-04-23 05:05:00 |
北海道 |
北海道新聞 |
「北海道ラグビーの日」命名 4月30日、5月1日 日本ハム移転後の札幌ドーム活用期待 |
https://www.hokkaido-np.co.jp/article/672971/
|
日本ハム |
2022-04-23 05:03:49 |
北海道 |
北海道新聞 |
若鶏半身揚げ、冷凍自販機に オホーツク管内初、遠軽の道の駅に |
https://www.hokkaido-np.co.jp/article/672972/
|
半身揚げ |
2022-04-23 05:01:34 |
北海道 |
北海道新聞 |
「つなぎ牛舎」に搾乳ロボ ホクレン訓子府実証農場で導入 オホーツク管内初 |
https://www.hokkaido-np.co.jp/article/672973/
|
訓子府町 |
2022-04-23 05:01:30 |
ビジネス |
東洋経済オンライン |
巻き起こる「脱マスク」議論が危険すぎる理由 “必要派"と“不要派"の不毛な分断を招くだけ | 新型コロナ、長期戦の混沌 | 東洋経済オンライン |
https://toyokeizai.net/articles/-/584240?utm_source=rss&utm_medium=http&utm_campaign=link_back
|
危険すぎる |
2022-04-23 05:40:00 |
コメント
コメントを投稿