practicebion.blogg.se - Bulk loading

#Bulk loading update#
#Bulk loading full#

Bulk containers are drums and pulleys made of wood.īulk cargo transportation provides a high degree of time savings. Goods can be transported in their parcels, drums or in the ship’s warehouse. The carrier does not need to spend any additional packaging costs. It enables to load high tonnage products without any additional packaging costs. This transportation technique also shows how effective maritime transportation is.īulk cargo transportation is very economical. Dry bulk logistics, in particular, ensures that vital products such as grain reach many parts of the world in high tonnages.īulk cargo transportation, which is one of the most traditional methods of maritime transportation, has become an indispensable part of international logistics. Bulk cargo transportation, which dominates almost more than half of the logistics market, plays a major role in trade between continents. The loads are unloaded to the ships with the help of cranes and placed in the warehouse areas of the ship. Loads in crates, parcels, containers or barrels. For this reason, large ships are preferred in this type of transportation. High tonnage is of great important in bulk logistics. Unprocessed mineral products are also among the products that are given as an example for bulk cargo transportation. In addition, pulleys, bobbins and cast wood products, which are construction materials, are also included in this class. Fertilizer, grain, animal feed, rubber, liquefied gas, all kinds of beverages and petroleum are included in this class. Generally, basic products fall into the bulk category. Bulk cargoes are cargoes that do not require packaging, are liquid or can take the shape of the area in which they are located. Or just use simple sql step in job to do the same.Bulk, that is, disassembled or bulky cargoes are among the most transported cargo types. The only way to make it faster, is to load all data in database into "temp" table and call function which will upsert data.

#Bulk loading update#

Check is is slow? Does update where clause use index on lookup fields)Īnyway this step is slow since it requires a lot of network communication, and data processing in kettle.

Is update slow? (Run manually lookup query on database on sample data.

Check is it slow ? Does lookup fields has index on those columns used to find correspond row in database)

Is lookup slow? (Run manually lookup query on database on sample data.

So if this step is bottleneck then, try to figure out what exactly slow. Source of PDI Kettle states that PreparedStatement is used for all queries: insert, update and lookup. It is lookup first and then update or insert. If they are not all the same, the row in the table is updated.īefore states, for each row in stream step will execute 2 queries. If itĬan be found and the fields to update are the same, nothing is done. If the row can't be found, it inserts the row. The Insert/Update step first looks up a row in a table using one or Why you have to avoid using insert/update (in case of huge amount of data processed or you are limited by time)? In case if hosts are different network bandwidth can play some role in performance. But your have to notice the fact, that kettle pdi and database located on same host. Means total around 60_000 rows per second.

Table output (batch inserts in 5 threads batch size 10_000): average 12_500 rows per second per each thread.

I set value of jvm heap size to 1024mb and increased value of batch size. This can be tweaked in variable PENTAHO_DI_JAVA_OPTIONS. Memory provided to jvm is enough to hold data. That is why worth to play with batch size, to find value which will provide best performance (bigger better), but on some level cause GC overhead. CPU that time loaded up to 100% and speed slows down significantly.

#Bulk loading full#

Actually after around 1_600_000 rows memory is full and gc is started. Total time is around 59s.Īdvantage of Buld loader is that is doesn't fill memory of jmv, all data is streamed into psql process immediately. Table output (batch inserts in 5 threads batch size 3_000): average 7_600 rows per second per each thread. Table output (batch inserts, batch size 10_000): average 28_000 rows per second. Table output (sql inserts): average 11_500 rows per second. Results are underneath to help you make decisionīulk loader: average over 150_000 rows per second around 13-15s PDI Kettle restarted on each run(to avoid heavily CPU load of gc run due huge amount rows).Data source: DBF FILE over 2_215_000 rows.