Algorithms reveal forecasting power of tweets

Published by in From the WWW on September 15th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Forecasting power of tweets

Forecasting power of tweets

Sang Won Yoon had a good Chinese meal recently — not always easy in America. It’s on his mind. Imagine, he says, that you and your co-workers plan via social media to head for lunch about 12:30 p.m. most Thursdays. Usually that Italian place downtown. Frequently tweet about traffic on the way.

Now imagine that at 10 a.m., you’re tweeted a coupon from the Chinese place near the Italian joint — and directions around a traffic jam that will start in about 90 minutes. Score one Sichuan hot pot.

By Todd R. McAdam. Published on September 10, 2014. Reproduced from / Read the full article at Binghamton University

Yoon can make that happen. He and fellow Binghamton University systems scientist Sarah Lam have been working with Binghamton alumnus Nathan Gnanasambandam, a senior researcher at the Palo Alto Research Center (PARC), a division of Xerox Research. They used 500 million tweets to develop algorithms that not only paint a picture of everyday human dynamics, but can predict an individual’s behavior hours in advance. The team, which also included graduate students Keith Thompson and Bichen Zheng, recently published their findings in Industrial Engineer.

Need expertise with Cloud / Internet Scale computing / Hadoop / Big Data / Algorithms / Architectures etc. ? Contact me - i can help – this is one of my primary expertise.

Think about what your typical social media post says about you: when you posted, where you were. Your networking relationships can be learned — and with context-based algorithms like those PARC and Binghamton University have developed — what you plan. They use what is called an artificial neural network.

How sure are they? Better than 90 percent for a typical social media user in a three-hour horizon. “If you look at the picture, it’s very static. But the individuals are all over the place,” Yoon says.

Some people are very careful about what data they give out, but the algorithms can work pretty well with anonymized data. Usable predictions can be made more than 60 percent of the time, if the right data are aggregated. And that data isn’t just coming from social media: Think about sources such as credit card transactions, monitored telephone calls, e-mail, GPS data.

Creepy, perhaps, but this type of analysis also has benefits. Xerox, which has funded and participated in the team’s ongoing research, can apply the tools to traffic. (It helps run the New York State Thruway’s EZ-Pass system and parking services in several cities across the country.) Imagine getting directions during an emergency that not only get you out of harm’s way, but get you to someplace personal where you’re safe, reducing the burden on emergency shelters. Or imagine directions that prevent a traffic jam, rather than simply route you around one.

Now apply that research tool to call and contact centers, which Xerox also runs. These methods can fuse data from call centers, online chat and e-mail help desks. “We give it structure — not all feeds have structure,” says Gnanasambandam, who is also a visiting professor in Binghamton’s department of systems science and industrial engineering.

“What if you call a company…” Yoon says, and Lam completes: “… And they know why you’re calling before you call?”

Help desk associates can be cross-trained in topics so they face less downtime, or calls could be routed faster to the best specialist. Data about problems can be analyzed in near-real time, perhaps allowing fixes to be made before the customer realizes there’s a problem. “That’s not too far away from what’s happening,” Gnanasambandam says.

Now direct this approach toward healthcare — which provides about $2 billion of Xerox’s annual business — and researchers can build tools to help patients, doctors, hospitals, insurers and pharmaceutical companies better understand the complexities of public health or ferret out prescription or Medicaid fraud.

“There’s a lot of different directions you can go,” Lam says.

Including to Yoon’s next Chinese meal.

Reproduced from / Read the full article at Binghamton University

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Stinger.next: Enterprise SQL at Hadoop Scale with Apache Hive – Hortonworks

Published by in From the WWW on September 14th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Need expertise with Cloud / Internet Scale computing / Hadoop / Big Data / Algorithms / Architectures etc. ? Contact me - i can help – this is one of my primary expertise.

Apache Hive, Stinger.next and Enterprise SQL at Hadoop Scale

The Stinger Initiative enabled Hive to support an even broader range of use cases at truly Big Data scale: bringing it beyond its Batch roots to support interactive queries – all with a common SQL access layer.

Stinger.next is a continuation of this initiative focused on even further enhancing the speed, scale and breadth of SQL support to enable truly real-time access in Hive while also bringing support for transactional capabilities.  And just as the original Stinger initiative did, this will be addressed through a familiar three-phase delivery schedule and developed completely in the open Apache Hive community.

r4

Stinger.next Project Goals

Speed
Deliver sub-second query response times.
Scale
The only SQL interface to Hadoop designed for queries that scale from Gigabytes, to Terabytes and Petabytes.
SQL
Enable transactions and SQL:2011 Analytics for Hive.

Hive has always been the defacto standard for SQL in Hadoop and these advances will surely accelerate the production deployment of Hive across a much wider array of scenarios.  Explicitly, some of the key deliverables that will enable these new business applications of Hive include:

  • Transactions with ACID semantics allow users to easily modify data with inserts, updates and deletes. They extend Hive from the traditional write-once, and read-often system to support analytics over changing data. This enables reporting with occasional corrections and modifications and allows operational reporting with periodic bulk updates from an operational database.
  • Sub-second queries will allow users to deploy Hive for interactive dashboards and explorative analytics that have more demanding response-time requirements.
  • SQL:2011 Analytics allows rich reporting to be deployed on Hive faster, more simply and reliably using standard SQL. A powerful cost based optimizer ensures complex queries and tool-generated queries run fast. Hive now provides the full expressive power that enterprise SQL users have enjoyed, but at Hadoop scale.

Transactions with ACID semantics in Hive

Hive has been used as a write-once, read-often system, where users add partitions of data and query this data often. ACID is a major shift in the paradigm, adding SQL transactions that allow users to insert, update and delete the existing data. This allows a much wider set of use cases that require periodic modifications to the existing data. ACID will include BEGIN, COMMIT and ROLLBACK for multi-statement transactions in next releases.

Screen Shot 2014-09-02 at 5.03.35 PM

Sub-Second Queries with Hive LLAP

Sub-second queries require fast query execution and low setup cost. The challenge for Hive is to achieve this without giving up on the scale and flexibility that users depend on. This requires a new approach using a hybrid engine that leverages Tez and something new called  LLAP (Live Long and Process, #llap online).

LLAP is an optional daemon process running on multiple nodes, that provides the following:

  • Caching and data reuse across queries with compressed columnar data in-memory (off-heap)
  • Multi-threaded execution including reads with predicate pushdown and hash joins
  • High throughput IO using Async IO Elevator with dedicated thread and core per disk
  • Granular column level security across applications

YARN will provide workload management in LLAP by using delegation. Queries will bring information from YARN to LLAP about their authorized resource allocation. LLAP processes will then allocate additional resources to serve the query as instructed by YARN.

The hybrid engine approach provides fast response times by efficient in-memory data caching and low-latency processing, provided by node resident processes. However, by limiting LLAP use to the initial phases of query processing, Hive sidesteps limitations around coordination, workload management and failure isolation that are introduced by running entire query within this process as done by other databases.

Screen Shot 2014-09-02 at 5.03.47 PM

. . .

. . . Continue reading the full article from Hortonworks

. . .

Need expertise with Cloud / Internet Scale computing / Hadoop / Big Data / Algorithms / Architectures etc. ? Contact me - i can help – this is one of my primary expertise.

Reproduced/Read the original from Hortonworks

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Revolution in Progress: The Networked Economy – MIT Technology Review

Published by in From the WWW on September 10th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

No question about it: The Networked Economy is the next economic revolution. In the coming years, it will offer unprecedented opportunities for businesses and improve the lives of billions worldwide. In fact, the revolution is already under way.

“Over the last few decades, we’ve grown beyond the industrial economy to the IT economy and the Internet economy, each of which led to significant inflection points in growth and prosperity,” says Vivek Bapat, SAP’s global vice president for portfolio and strategic marketing. “Now we’re looking at the Networked Economy.” This new economy, resulting from a convergence of the economies that came before it and catalyzed by a new era of hyperconnectivity, is creating spectacular new opportunities for innovation.

And, like any revolution, the Networked Economy is going to be big. Very big.

“Over the next 10 to 15 years, it has the potential to double the size of the gross world product,” Bapat says. SAP estimates that the Networked Economy will represent an economic value of at least $90 trillion.

Three Questions — and Answers — About the Networked Economy  By MIT Technology Review Custom on August 27, 2014 | In Partnership with SAP.

Three Questions — and Answers — About the Networked Economy By MIT Technology Review Custom on August 27, 2014 | In Partnership with SAP.

Read the full article from MIT Review

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Breakthrough results on small cluster opens the door for new IoT applications and real-time data analysis – MapR

Published by in From the WWW on September 10th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Hadoop

Hadoop

MapR Performance Benchmark Exceeds 100 Million Data Points Per Second Ingest. MapR Technologies, Inc., provider of the top-ranked distribution for Apache™ Hadoop®, today announced at the Tableau Conference, breakthrough performance results achieved using standard open source software, OpenTSDB, running on the MapR Distribution. Using only four-nodes of a 10-node cluster, the MapR Distribution with its in-Hadoop NoSQL database, MapR-DB, ingested over 100 million data points per second.

By accelerating OpenTSDB performance by 1,000 times on such a small cluster, MapR opens the doors to cost-effectively manage massive volumes of data and enable new applications such as Internet of Things (IoT) and other real-time data analysis applications, including industrial monitoring of manufacturing facilities, predictive maintenance of distributed hardware systems and datacenter monitoring.

need expertise with cloud / big data / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

“The accelerated performance for OpenTSDB validates the differentiated efficiency and scale that MapR brings to the table,” said Ted Dunning, chief application architect for MapR Technologies. “OpenTSDB is a widely used database intended to store and analyze time-series data. Originally designed for only data center monitoring, poor ingest performance had limited the expansion of its use. This benchmark demonstrates a viable option for new applications, such as IoT and other real-time data-analysis applications, using OpenTSDB running on MapR.”

According to estimates from Cisco, there will be approximately 50 billion devices connected by 2020. These IoT devices include sensors and other embedded data capturing devices that are communicating information continuously and pushing the boundaries of traditional data management platforms. Healthcare, manufacturing and utilities are examples of industries where decisions based on continuous data analysis can improve business operations. These devices will be phoning home and sending data. Time series databases will be critical to store and analyze these data sets.

MapR has published the details required to replicate these tests in the MapR App Gallery and on GitHub.

Read the original news/reproduced from MapR

 

 

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

High-performance computing crossing the barriers between clouds achieved – PHYS.org

Published by in From the WWW on September 8th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

The Information Technology Research Institute of the National Institute of Advanced Industrial Science and Technology has developed a technology with which once an environment to perform high-performance computing has been established, a virtual cluster-type computer can easily be built on a different cloud and made available for immediate use.

Virtual cluster-type computer built on different clouds

Virtual cluster-type computer built on different clouds

Generally, in high-performance computing, cluster-type computers where many computers are bundled and run as a single computer are used. However, their hardware configuration is not uniform. On the other hand, virtual computers that are not dependent on hardware configuration are provided in clouds, and by bundling them together, a virtual cluster-type computer can be created. However, in this case, the user had to re-install software or reset the settings for a different cloud.

Read the original article from / reproduced from PHYS.org

 Therefore, a technology to build a virtual cluster-type computer based on the design concept of “Build Once, Run Everywhere” has been developed. Once the environment to run the application has been established it may be run on any cloud, be it a private, commercial, or other cloud. Furthermore, since there are no constraints on the number of virtual computers that can be incorporated into the cluster, when the computing power is insufficient, an even larger virtual cluster-type computer can be formed on another cloud that allows the use of even more virtual computers, but allowing it to be used in exactly the same manner.

A virtual cluster-type computer was formed on AIST’s private cloud, AIST Super Green Cloud (ASGC), and the ability to use it on Amazon EC2, a commercial cloud, was verified. With this technology, users and application fields that could not use high-performance computing previously can now use high-performance computing. Thus, the developed technology is expected to contribute to the enhancement of industrial competitiveness.

need expertise with cloud / big data / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

There are many research organizations and companies that require high-performance computing, such as in the development of automobiles and for drug discovery. Conventionally, each organization prepared cluster-type computers within their organization. This required the introduction of a system with even higher performance to solve problems exceeding its computing capacity. Further, it was not readily available for introduction when it was required.

Figure 1 : Usage conceptual diagram of a virtual cluster-type computer

Figure 1 : Usage conceptual diagram of a virtual cluster-type computer

In clouds now widely available today, computing performance can be increased through the addition of computers by bundling virtual computers to form a cluster-type computer. However, when the built environment is to be re-created on a different cloud, it required the software to be re-installed and the settings reconfigured, necessitating extra time, labor, and cost.

Furthermore, because initial introduction and operating costs for cluster-type computers are high, the environment for high-performance computing could not be maintained, especially for small- and medium-scale enterprises. Expansion of the fields in which high-performance computing can be applied in support of such users is required for the enhancement of industrial competitiveness.

AIST is conducting R&D aimed at achieving a high-performance computing infrastructure with both the convenience to run on any cluster-type computer once a high-performance computing application-executing environment has been created, and high computing performance. In the process, R&D was conducted under the concept of separating the application-executing environment from actual machines by virtualization using cloud technologies to establish cluster-type computers on various clouds as required. In addition, although a cloud is established with virtualization technology, in the field of high-performance computing, there has been an issue of a drop in computing performance when virtualized, which has hindered its popularization. Therefore, evaluation of the effects of virtualization when executing high-performance computing applications was conducted in detail and technologies to reduce the deterioration of performance caused by virtualization have been developed.

Read the original article from / reproduced from PHYS.org

There are two main elements that comprise the developed technology to build virtual cluster-type computer. One is a mechanism to build and share a virtual computer image file that does not depend on the cloud, and the other is a mechanism to build a virtual cluster-type computer from the virtual computer image file. By using these technologies, a virtual cluster-type computer with a sufficient number of computers to match the scale of the computation can be built on various clouds. Specifically, with this technology, a virtual computer image file containing software and setting data common to all the virtual computers is transferred to virtual computers bundled in numbers required by the user, and by setting the information that is determined when the system is established, such as IP addresses to identify each computer or a list of the computers, a single cluster-type computer is formed. Software required for the development, execution, and monitoring of high-performance computing applications is pre-installed in the virtual computer image file, enabling the user to immediately start its use. In addition, because the user can freely customize the virtual computer image file, a personalized application-executing environment can be built (Fig. 1). This time, a virtual cluster-type computer was built on ASGC and using the same virtual computer image file, an even larger virtual cluster-type computer was built on Amazon EC2 and it was verified that it could be used in exactly the same manner.

Furthermore, the technology for the reduction of virtualization performance deterioration was developed and integrated into ASGC, and its effects were verified. Evaluation using High Performance LINPACK, a benchmark for high-performance computing, resulted in 6.77 TFLOPS for a non-virtualized cluster-type computer using 16 computers, against 6.40 TFLOPS for a virtualized cluster-type computer, an approximately 5 % deterioration in performance. Performance deterioration depends on the characteristics of the application. However, it has been confirmed that it is within acceptable range without any problems for practical use.
Currently available clouds only provide a single virtual computer and no virtual cluster-type computers are provided. Although there is software to build a virtual cluster-type computer on Amazon EC2, a commercial cloud, it is not configured to form an identical virtual cluster type computer on a different cloud. As far as we know, there is no high-performance computing cloud service other than the developed technology that enables the building of virtual cluster-type computers on different clouds. The developed technology enables multiple clouds to be used in accordance with necessity and opens up a path towards a new high-performance computing infrastructure.

The researchers will cooperate with academic clouds formed by universities and public organizations in Japan, as well as with research communities abroad, to conduct field tests concerning virtual cluster-type computers, and deploy it as an operational service on ASGC. Furthermore, the technology will be transferred to cloud services providers to expand users and the application fields of high-performance computing.

Read the original article from / reproduced from PHYS.org

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Quality of Service in Hadoop – ebay Tech Blog

Published by in From the WWW on September 7th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

At eBay we run Hadoop clusters comprising thousands of nodes that are shared by thousands of users. We analyze data on these clusters to gain insights for improved customer experience. In this post, we look at distributing RPC resources fairly between heavy and light users, as well as mitigating denial of service attacks within Hadoop. By providing appropriate response times and increasing system availability, we offer a better Hadoop experience.

By by CHRIS LI on 08/21/2014
in DATA INFRASTRUCTURE AND SERVICES
Reproduced from eBay Tech Blog

Problem: namenode slowdown

In our clusters, we frequently deal with slowness caused by heavy users, to the point of namenode latency increasing from less than a millisecond to more than half a second. In the past, we fixed this latency by finding and terminating the offending job. However, this reactive approach meant that damage had already been done—in extreme cases, we lost cluster operation for hours.

need expertise with cloud / big data / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

This slowness is a consequence of the original design of Hadoop. In Hadoop, the namenode is a single machine that coordinates HDFS operations in its namespace. These operations include getting block locations, listing directories, and creating files. The namenode receives HDFS operations as RPC calls and puts them in a FIFO call queue for execution by reader threads. The dataflow looks like this:

Though FIFO is fair in the sense of first-come-first-serve, it is unfair in the sense that users who perform more I/O operations on the namenode will be served more than users who perform less I/O. The result is the aforementioned latency increase.

We can see the effect of heavy users in the namenode auditlogs on days where we get support emails complaining about HDFS slowness:

Solution: quality of service

Taking inspiration from routers—some of which include QoS (quality of service) capabilities—we replaced the FIFO queue with a new type of queue, which we call the FairCallQueue.

The scheduler places incoming RPC calls into a number of queues based on the call volume of the user who made the call. The scheduler keeps track of recent calls, and prioritizes calls from lighter users over calls from heavy users.

The multiplexer controls the penalty of being in a low-priority queue versus a high-priority queue. It reads calls in a weighted round-robin fashion, preferring to read from high-priority queues and infrequently reading from the lowest-priority queues. This ensures that high-priority requests are served first, and prevents starvation of low-priority RPCs.

The multiplexer and scheduler are connected by a multi-level queue; together, these three form the FairCallQueue. In our tests at scale, we’ve found the queue is effective at preserving low latencies even in the face of overwhelming denial-of-service attacks on the namenode.

This plot shows the latency of a minority user during three runs of a FIFO queue (QoS disabled) and the FairCallQueue (QoS enabled). As expected, the latency is much lower when the FairCallQueue is active. (Note: spikes are caused by garbage collection pauses, which are a separate issue).

Open source and beyond

The 2.4 release of Apache Hadoop includes the prerequisites to namenode QoS. With this release, cluster owners can modify the implementation of the RPC call queue at runtime and choose to leverage the new FairCallQueue. You can try the patches on Apache’s JIRA: HADOOP-9640.

The FairCallQueue can be customized with other schedulers and multiplexers to enable new features. We are already investigating future improvements, such as weighting different RPC types for more intelligent scheduling and allowing users to manually control which queues certain users are scheduled into. In addition, there are features submitted from the open source community that build upon QoS, such as RPC client backoff and Fair Share queuing.

With namenode QoS in place, we have improved our users’ experience of our Hadoop clusters by providing faster and more uniform response times to well-behaved users while minimizing the impact of poorly written or badly behaved jobs. This in turn allows our analysts to be more productive and focus on the things that matter, like making your eBay experience a delightful one.

– Chris Li

eBay Global Data Infrastructure Analytics Team

Reproduced from eBay Tech Blog

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Context sensitive information: Which bits matter in data? – MSR

Published by in From the WWW on September 6th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Speaker: Joachim Buhmann. Affiliation: ETH Zurich. Host: Pushmeet Kohli. Duration: 00:51:06

Speaker: Joachim Buhmann. Affiliation: ETH Zurich. Host: Pushmeet Kohli. Duration: 00:51:06

Learning patterns in data requires to extract interesting, statistically significant regularities in (large) data sets, e.g. detection of cancer cells in tissue microarrays and estimating their staining or role mining in security permission management. Admissible solutions or hypotheses specify the context of pattern analysis problems which have to cope with model mismatch and noise in data. An information theoretic approach is developed which estimates the precision of inferred solution sets and regularizes solutions in a noise adapted way.

The tradeoff between “informativeness” and “robustness” is mirrored by the balance between high information content and identifiability of solution sets, thereby giving rise to a new notion of context sensitive information. Cost function to rank solutions and, more abstractly, algorithms are considered as noisy channels with a generalization capacity.

The effectiveness of this concept is demonstrated by model validation for spectral clustering based on different variants of graph cuts. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and thereby the pairwise comparisons are subject to fluctuations.

Read the original / eproduced from MSR

need expertise with machine learning / data analytics / cloud / big data / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Big data tamed with the cloud – MSR

Published by in From the WWW on September 6th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Big data: it’s the hot topic these days, promising breakthroughs in just about every field, from medicine to marketing to machine learning and more. But for many of us, the problems of managing big data hit home when we confront the welter of digital photos and videos we have recorded with our smartphones and cameras. Multiply this by the number of people doing this around the world and it is a big problem. On the surface, it does not seem like an endeavor on the order of treating cancer (more on that later), but it is a colossal headache to organize, classify, search, and retrieve our multimedia content—and designing systems to do this at scale effectively is a huge challenge.

Read the original article/reproduced from MSR

Thankfully, Professor Heiko Schuldt and Ivan Giangreco of theDatabases and Information Systems (DBIS) Group at the University of Basel are working on a project to do just that, and a whole lot more. Their integrated system harnesses the power of the cloud, through Microsoft Azure, to understand and sort through the terabytes of data that make up multimedia content to find and return like objects.

need expertise with cloud / big data / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

The Basel team’s system combines the power of relational databases, with the adaptability of information retrieval systems. The Basel system can handle and store any type of multimedia data, including their features. When an algorithm for feature extraction is defined, the system automatically executes the extraction, storage, and indexing of both the feature data and the object itself. This approach efficiently carries out Boolean queries as well as searches based on ranking images based on their feature similarity scores. In addition, it provides novel query paradigms and interfaces; for example, you can sketch an image or parts thereof and find images that are similar to your sketch.

It’s exciting to see how this work has progressed since the Basel researchers attended our first European Microsoft Azure for Research training workshop at ETH Zurich last November. They successfully applied for an Azure Award, which got them up and running on the cloud within a few weeks. This allowed the team to quickly develop and deploy their system in a scalable way. Microsoft Azure is ideal as a fast, distributed storage and computing fabric for running the Basel team’s project, whose MapReduce-style program can grow as millions of images are added to the system. By moving to the cloud, the Basel researchers have been able to develop, deploy, and demonstrate the system, testing their ideas at scale on the 14 million images that comprise the ImageNet database. They presented this work at the IEEE International Congress on Big Data (BigData 2014).

Professor Schuldt explains how Azure has helped him with his research. “In large-scale image retrieval, both effectiveness and efficiency are essential requirements. Thanks to Microsoft’s support and the use of the Azure cloud, we have been able to successfully address the retrieval efficiency so that we can concentrate further on retrieval effectiveness, especially by developing novel search paradigms and user interfaces based, for instance, on gestures or sketches.”

The Basel researchers are looking forward to tackling the even bigger Bing Clickture dataset, which contains 40 million images. They also plan to test the system on video content, in what they’re calling the IMOTION project, which will “multiply the challenges in terms of retrieval efficiency,” notes Professor Schuldt. Their next paper was presented at 37th International ACM-SIGIR Conference on Research and Development in Information Retrieval, and we’re looking forward to seeing how the team continues to push the boundaries of big data by using Microsoft Azure.

Now back to that earlier comment about treating cancer. Approaches similar to those used by the Basel team’s project might, in fact, someday help us to better understand and treat cancer. The underlying computer science and cloud technologies could be used, for example, for managing and analyzing MRI scans of tumors.

The Basel team’s project is just one example of how easy it is to get up and running on the cloud and accelerate your research—especially when by taking advantage of the Microsoft Azure for Research initiative, which offers not only training but also substantial grants of Azure storage and compute resources for qualified projects.Read about the initiative and our requests for proposals. Who knows? Maybe your project will be the next big thing in big data.

Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research

Read the original article/reproduced from MSR

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

First graphene-based flexible display produced – PHYS.org

Published by in From the WWW on September 6th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Active matrix electrophoretic display incorporating graphene. Credit: Plastic Logic

Active matrix electrophoretic display incorporating graphene. Credit: Plastic Logic

(Phys.org) —A flexible display incorporating graphene in its pixels’ electronics has been successfully demonstrated by the Cambridge Graphene Centre and Plastic Logic, the first time graphene has been used in a transistor-based flexible device.

The partnership between the two organisations combines the graphene expertise of the Cambridge Graphene Centre (CGC), with the transistor and display processing steps that Plastic Logic has already developed for flexible electronics. This prototype is a first example of how the partnership will accelerate the commercial development of graphene, and is a first step towards the wider implementation of graphene and graphene-like materials into flexible electronics.

Read the full/original/reproduced from PHYS.org

Graphene is a two-dimensional material made up of sheets of carbon atoms. It is among the strongest, most lightweight and flexible materials known, and has the potential to revolutionise industries from healthcare to electronics. The new prototype is an active matrix electrophoretic display, similar to the screens used in today’s e-readers, except it is made of flexible plastic instead of glass. In contrast to conventional displays, the pixel electronics, or backplane, of this display includes a solution-processed graphene electrode, which replaces the sputtered metal electrode layer within Plastic Logic’s conventional devices, bringing product and process benefits.

Graphene is more flexible than conventional ceramic alternatives like indium-tin oxide (ITO) and more transparent than metal films. The ultra-flexible graphene layer may enable a wide range of products, including foldable electronics. Graphene can also be processed from solution bringing inherent benefits of using more efficient printed and roll-to-roll manufacturing approaches.
The new 150 pixel per inch (150 ppi) backplane was made at low temperatures (less than 100°C) using Plastic Logic’s Organic Thin Film Transistor (OTFT) technology. The graphene electrode was deposited from solution and subsequently patterned with micron-scale features to complete the backplane.

For this prototype, the backplane was combined with an electrophoretic imaging film to create an ultra-low power and durable display. Future demonstrations may incorporate liquid crystal (LCD) and organic light emitting diodes (OLED) technology to achieve full colour and video functionality. Lightweight flexible active-matrix backplanes may also be used for sensors, with novel digital medical imaging and gesture recognition applications already in development.

“We are happy to see our collaboration with Plastic Logic resulting in the first graphene-based electrophoretic display exploiting graphene in its pixels’ electronics,” said Professor Andrea Ferrari, Director of the Cambridge Graphene Centre. “This is a significant step forward to enable fully wearable and flexible devices. This cements the Cambridge graphene-technology cluster and shows how an effective academic-industrial partnership is key to help move graphene from the lab to the factory floor.”
“The potential of graphene is well-known, but industrial process engineering is now required to transition graphene from laboratories to industry,” said Indro Mukerjee, CEO of Plastic Logic. “This demonstration puts Plastic Logic at the forefront of this development, which will soon enable a new generation of ultra-flexible and even foldable electronics”

This joint effort between Plastic Logic and the CGC was also recently boosted by a grant from the UK Technology Strategy Board, within the ‘realising the graphene revolution’ initiative. This will target the realisation of an advanced, full colour, OELD based display within the next 12 months.

Read the full/original/reproduced from PHYS.org

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Oracle Big Data SQL lines up Database with Hadoop, NoSQL frameworks

Published by in From the WWW on July 15th, 2014 | Comments Off

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

Hadoop

Hadoop

Hadoop continues to operate as a looming influence in the world of big data, and that holds true with the unveiling of the next step in Oracle’s big data roadmap.  Oracle’s latest big idea for big data aims to eliminate data silos with new software connecting the dots between the Oracle Database, Hadoop and NoSQL. By  for Between the Lines |

“Oracle has taken some of its intellectual property and moved it on to the Hadoop cluster, from a database perspective,” Mendelson explained.

need expertise with cloud / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

Read the original / reproduced from ZDNet

The Redwood Shores, Calif.-headquartered corporation introduced Oracle Big Data SQL, SQL-based software streamlining data running between the Oracle Database with NoSQL and Hadoop frameworks.

The approach is touted to minimize data movement, which could translate to faster performance rates for crunching numbers while also reducing security risks while in transit.

Big Data SQL promises to be able to query any and all kinds of structured and unstructured data. Oracle Database’s security and encryption features can also be blanketed over Hadoop and NoSQL data.

Beyond extending enterprise governance credit, Oracle connected plenty of dots within its portfolio as well. Big Data SQL runs on Oracle’s Big Data Appliance and is set up to play well with the tech titan’s flagship Exadata database machine. The Big Data SQL engine also borrowed other familiar portfolio elements such as Smart Scan technology for local data queries from Exadata.

The Big Data Appliance itself was built on top of Oracle’s cloud distribution, which has been in the works for the last three years.

Neil Mendelson, vice president of big data and advanced analytics at Oracle, told ZDNet on Monday that enterprise customers are still facing the following three obstacles: managing integration and data silos, obtaining the right people with new skill sets or relying on existing in-house talent, and security.

“Over this period of time working with customers, they’re really hitting a number of challenges,” Mendelson posited. He observed much of what customers are doing today is experimental in nature, but they’re now ready to move on to the production stage.

Thus, Mendelson stressed, Big Data SQL is designed to provide users with the ability to issue a single query, which can run against data in Hadoop and NoSQL — individually or any combination therein.

“Oracle has taken some of its intellectual property and moved it on to the Hadoop cluster, from a database perspective,” Mendelson explained.

In order to utilize Big Data SQL, Oracle Database 12c is required first. Production is slated to start in August/September, and pricing will be announced when Big Data SQL goes into general availability.

Also on Tuesday, the hardware and software giant was expected to ship a slew of security updates fixing more than 100 vulnerabilities across hundreds of versions of its products.

That is following a blog post on Monday penned by Oracle’s vice president of Java product management, Henrik Stahl, who aimed to clarify the future of Java support on Windows XP.

He dismissed claims that Oracle would hamper Java updates from being applied to systems running the older version of Windows or that Java wouldn’t work on XP altogether anymore.

Nevertheless, Stahl reiterated Oracle’s previous stance that users still running Windows XP should upgrade to an operating system currently supported.

need expertise with cloud / internet scale computing / mapreduce / hadoop  etc. ? contact me - i can help! – this is a core expertise area.

Read the original / reproduced from ZDNet

Reproduced and/or syndicated content. All content and images is copyright the respective owners.

© all content copyright respective owners
CyberChimps