Persistent
 
Big Data

Solutions and Accelerators
Pebal

Big Data Platforms
IBM BigInsights

Training

Persistent BigData Analytics Library

Over the last few years Persistent has worked ona wide range of Big Data projects. Our experience has shown that Hadoop and NoSQL technologies provide the most flexible and scalable programming environment, for processing and managing very large
volumes of data. However there are challenges during development. Persistent’s Big Data Analytics library (Pebal) was developed to help customers address the most common issues in Big Data application development. Pebal is a library of readymade functions that are commonly required in developing Big Data applications.
Big Data applications run on the Map Reduce framework. When an application is ported to this framework, there are challenges with seamlessly reusing the library functions. These functions may have to be rewritten. Further, the absence of a library that packages commonly used functions in Big Data applications means that these functions have to be written from scratch by each developer. Developing and deploying becomes challenging because development takes longer, as well as overall code quality and performance. In order to help mitigate these issues, Persistent has developed a library of commonly used functions–The Persistent BigData Analytics Library (Pebal).

What is Pebal?

The PersistentBigData Analytics Library (Pebal) is a set of prebuilt functions that can be used as building blocks for rapid application development.These high performance and easy-to-use algorithms find applicability in several commonly required operations in text analytics, web analytics, indexing, sets and graphs.




Pebal has 4 main groups of functions/algorithms that are frequently required for application development in Big Data :

Function Group Use Case
Graphs Web graphs, people graph in connection, road network, gnome sequence
Sets Shopping carts
Text Analytics Web pages, tweets, blogs
Indexing Searching Text mining, information retrieval


These groups of functions have smart algorithms such as:
  • Smarter TopN algorithm to find topN items over a large data set. This algorithm is faster and more efficient than sorting the entire data set as it results into fewer comparisons, increasing performance
  • Smart Dictionary Extraction algorithm handles spelling mistakes, variations, and short forms from documents which may contain typos
  • Sensitive Data Algorithm masks sensitive information such as name, company name, and personal info from sensitive documents such as financial reports and medical prescriptions
The Pebal Advantage
  • Faster Program Development: Similar to the Standard Template Library (STL) used in C++, PeBAL functions significantly reduce the time to develop and deploy big data solutions from few months to a few weeks
  • Optimized algorithms enable the development of high performance & efficient code
  • Customizable Hadoop Application: Facilitates data conversion, data filtering, graph building, etc.
  • Increased Productivity: Pebal helps Hadoop application developer in entire workflow by converting data formats, building data structures, text extraction & pre-computing summaries
  • Accurate Analysis: Pebal helps data analyzers to analyze the unstructured data faster and gain quick insights
Pebal Features:
  • Pebal functions are generic, easy to learn, configure and use
  • Compatability: Pebal is currently compatible with IBM InfoSphere BigInsights r1.3. It would also be compatible with other Big Data platforms in future
  • Global Applications: Pebalis comprehensive. It includes a range of functions (graphs, text analytics, sets, etc.) used in various Big Data applications
  • Efficient: All Pebal functions are tested and optimized for performance. Pebal functions also recommend data layout on disk, making Pebal an efficient tool for Big Data application development
  • Multiple applications: Pebal functions can be used in variety of Big Data applications as they are schema agnostic, easy to wrap in scripts and easy to configure
 

Dr. Mukund deshpande , AVP - Head BI & Analytics Business Unit talk's about the BigData Analyics Library - Pebal

Related Content
 
blogger facebook linkedin twitter youtube