Azure Streaming Units - how do I monitor? how do I scale?

February 2, 2016, 12:51 pm

≫ Next: Learn more about Azure Stream Analytics Time Skew Policies

≪ Previous: Notify users of data received from sensors or other systems

Our CSS team does a fabulous job reaching out to us with customer questions, so we can draft out some documentation to address common questions.. Bill Carroll, Senior Escalation Engineer reached out to us today with this question.

I have a customer who is monitoring the SU% as reported in management web app's monitor widget. He sees that the stream will process inputs and go to SU% of 80%. The inputs will stop (no incoming events) but the SU% stays at 80% until he stops and restarts the job. He let it sit for 8 hours with no incoming events but the SU% remained at 80%. He is under the impression that the high SU% will negatively effecting future job performance.

Questions:
• Is there a description of SU% and what type of behavior a customer should expert from this metric?
• What kind of baseline SU% can be expected on a single streaming unit when no events are incoming?
• Does the SU% include the memory consumption of a container in the backed and will not decrease until garbage collection occurs in the JVM?
• How is the SU% determined?

If you are looking for an answer for these questions, please refer to our documentation here:

https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-monitoring/

80% is right at the boundary of using up all memory resources. We currently don’t aggressively reclaim memory if utilization stays under 80%. Once the utilization is over 80%, we recommend customers to add more SUs. You can use up to 6 SUs without partitioning the query. If you need more resource, we have to scale out the query, which requires user to tell us how to partition the query using the “partition by” key word, as shown here. https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-scale-jobs/

If you have such questions and would like us to build documentation, please tweet to us @AzureStreaming

↧

Learn more about Azure Stream Analytics Time Skew Policies

February 3, 2016, 11:48 am

≫ Next: Why did my Azure Stream Analytics services get into "Degraded" status?

≪ Previous: Azure Streaming Units - how do I monitor? how do I scale?

In Stream Analytics, all data stream events have a timestamp associated with them. As all events are temporal in nature and timing of arrival of the event is how the timestamp is assigned, considerations exists for both the tolerance of out of order events and the late arrival of events to the Stream Analytics job. Contributors to Late Arrival and Out of Order event vary but generally are one or more of the following:

•Producers of the events have clock skews. This is common when producers are from different machines, so they have different clocks.

•Network delay from the producers sending the events to Event Hub.

•Clock skews between Event Hub partitions. This is also a factor because we first sort events from all Event Hub partitions by event enqueue time, and then examine the disordness.

Read about Azure Stream Analytics Time Skew Policies here

↧

Why did my Azure Stream Analytics services get into "Degraded" status?

February 5, 2016, 5:51 pm

≫ Next: Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

≪ Previous: Learn more about Azure Stream Analytics Time Skew Policies

At Azure Stream Analytics we get asked from time to time this question - why is my stream analytics services showing the status "degraded" . We are bringing back an old post up again to answer this :)

User wants to know:
- what can be the problem that causes “Degraded” status
- is there a way I can set notifications or alarms on stream analytics (status change, too many events etc.)
- is there a way to examine logs of stream analytics and see what caused the problem

The answers to above questions can be found here: http://blogs.msdn.com/b/streamanalytics/archive/2015/06/29/intro-to-diagnostics-for-azure-stream-analytics.aspx

↧

Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

February 6, 2016, 3:18 pm

≫ Next: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

≪ Previous: Why did my Azure Stream Analytics services get into "Degraded" status?

Often we receive questions from users of Azure Stream Analytics which help us improve our documentation or error handling. These Q&A’s also help other users. Here is one such case of figuring out why ASA results may not be appearing in Azure SQL server table.

Question:

I’m having trouble outputting my ASA job results into an Azure SQL server table. The schema of the ASA output and the SQL server table matches perfectly with datatypes and field names as per the instructions I’ve read . I’m still however getting an error when trying to start the job. Can someone point out what I’m doing wrong? Or Is there a step by step walkthrough or instructions somewhere on how to use SQL server as an output correctly?

Error: Stream Analytics job has validation errors: The output output used in the query was not defined. Activity Id: 'activity id string'.

Here is my ASA query:

Select
CAST(Text As nvarchar(max)) as text,
Cast (CreatedAt as datetime) as createdat,
(case Cast(sentimentscore as nvarchar(max))
when 2 then 0
when 0 then -10
when 4 then 10 else sentimentscore
end) as sentimentscore,

(case CAST(topic as nvarchar(max))
when 'XBox' then 'Xbox'
else topic end) as topic,

(case CAST(sentimentscore AS nvarchar(max))
when '0' then 'Bad :('
when '2' then 'Neutral'
when '4' then 'Good :)' else sentimentscore end) as sentiment
from TwitterSteam

Solution by our team member Zhong Chen:

You may have a message in your ops log similar to the following message. Often we have noticed people have trouble locating relevant logs when job fails. We are working on improving that experience. Meanwhile, please carefully sift through the Error logs in the relevant time range.
{"Message Time":"2016-02-02 22:12:00Z","Error":"Comparison is not allowed for operands of type 'nvarchar(max)' and 'bigint' in expression 'case Cast ( sentimentscore as nvarchar ( max ) ) when 2 then 0 when 0 then - 10 when 4 then 10 else sentimentscore end'.\u000d\u000a","Message":"Runtime exception occurred while processing events, Comparison is not allowed for operands of type 'nvarchar(max)' and 'bigint' in expression 'case Cast ( sentimentscore as nvarchar ( max ) ) when 2 then 0 when 0 then - 10 when 4 then 10 else sentimentscore end'.\u000d\u000a, : OutputSourceAlias:twitter2;","Type":"SqlRuntimeError","Correlation ID":"connection id string"}

If you have questions like this one, please reach out to us on twitter : @AzureStreaming

↧

Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

February 16, 2016, 2:07 pm

≫ Next: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

≪ Previous: Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

In typical Internet of Things (IoT) scenarios, you have devices that you can program to push data to Azure, either to an Azure Event Hub or an IoT hub. Both of those hubs are entry points into Azure for storing, analyzing, and visualizing with a myriad of tools made available on Microsoft Azure. However, they both require that you push data to them, formatted as JSON and secured in specific ways. This brings up the following question. What do you do if you want to bring in data from either public or private sources where the data is exposed as a web service or feed of some sort, but you do not have the ability to change how the data is published? Consider the weather, or traffic, or stock quotes - you can't tell NOAA, or WSDOT, or NASDAQ to configure a push to your Event Hub. To solve this problem, Spyros Sakellariadis and Dinar Gainitdinov have written and open-sourced a small cloud sample that you can modify and deploy that will pull the data from some such source and push it to your Event Hub. From there, you can do whatever you want with it, subject, of course, to the license terms from the producer. You can find the application here.

Check out the technical documentation and the GenericWebToEH solution code to try it out on some of your favorite data feeds.

For other scenarios there are a number of documentation articles and code samples on pushing data from devices you control to Azure and for analyzing in combination with other streaming or static data.

↧

Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

February 16, 2016, 2:07 pm

≫ Next: Handling Json array in Stream Analytics Query

≪ Previous: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

Check out the technical documentation and the GenericWebToEH solution code to try it out on some of your favorite data feeds.

↧

Handling Json array in Stream Analytics Query

February 23, 2016, 8:31 am

≫ Next: How to deal with missing events in streaming data?

≪ Previous: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

This week, I am sharing a query question asked in stackoverflow to illustrate how to handle json array in Stream Analytics Query:

Problem

I retrieve some weatherdata from an external API. This is returned as JSON and send to an Azure IoT hub. Stream analytics processes the json into a proper format, but I got a problem here.

The element: Current_Condition, is of an array format. It always has one element on the [0] position. I only need to get the data of that array from that very first position, without a filter for things like id etc.

Under here is the complete data

{

"deviceId":"aNewDevice",

"data":

{"data":

{"current_condition":[

{

"cloudcover":"0",

"FeelsLikeC":"0",

"FeelsLikeF":"32",

"humidity":"100",

"observation_time":"10:00 AM",

"precipMM":"0.0",

"pressure":"1020",

"temp_C":"2",

"temp_F":"36",

"visibility":"0",

"weatherCode":"143",

"weatherDesc":[{"value":"Fog, Mist"}],

"weatherIconUrl":[{"value":"http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0006_mist.png"}],

"winddir16Point":"SSW",

"winddirDegree":"210",

"windspeedKmph":"7",

"windspeedMiles":"4"

}

"request":[

{

"query":"Nijmegen, Netherlands",

"type":"City"

}

]

}

Solution:

You need to use GetArrayElement function. For example:

SELECT GetRecordProperty(GetArrayElement(Current_Condition,0),'humidity')

To make it a bit nicer you can split query into 2 steps:

WITH CurrentConditions AS

SELECT

deviceId,

GetArrayElement(Current_Condition,0)as conditions

FROM input

SELECT

deviceID,

conditions.humidity

FROM CurrentConditions

↧

How to deal with missing events in streaming data?

February 25, 2016, 10:14 am

≫ Next: Learn more about Azure Stream Analytics Time Skew Policies

≪ Previous: Handling Json array in Stream Analytics Query

Streaming data is often not perfect – some of the events can be missing and some can be generated or received with delay. At the same time downstream applications may require input data within regular intervals (e.g. every 5 seconds)

Some customers asked us – how can Azure Stream Analytics be used to convert a stream of event with missing values into a stream of events with regular intervals? Event that was received last should be used to fill in missing values.

This is easy with Hopping Window:

SELECT

System.TimestampAS windowEnd,

TopOne() OVER (ORDERBY time DESC)AS lastEvent

FROM

inputTIMESTAMPBY T

GROUPBY HOPPINGWINDOW(second, 300, 5)

This query will generate events every 5 second and will output last event that was received before. Please note, that as part of the Window definition, you need to specify Window duration – this is how much back the query will look to find the latest event (300 seconds in our example).

Please check more query examples at Query examples for common Stream Analytics usage patterns

↧

Learn more about Azure Stream Analytics Time Skew Policies

February 3, 2016, 3:48 am

≫ Next: Why did my Azure Stream Analytics services get into "Degraded" status?

≪ Previous: How to deal with missing events in streaming data?

•Producers of the events have clock skews. This is common when producers are from different machines, so they have different clocks.

•Network delay from the producers sending the events to Event Hub.

•Clock skews between Event Hub partitions. This is also a factor because we first sort events from all Event Hub partitions by event enqueue time, and then examine the disordness.

Read about Azure Stream Analytics Time Skew Policies here

↧

Why did my Azure Stream Analytics services get into "Degraded" status?

February 5, 2016, 9:51 am

≫ Next: Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

≪ Previous: Learn more about Azure Stream Analytics Time Skew Policies

At Azure Stream Analytics we get asked from time to time this question – why is my stream analytics services showing the status “degraded” . We are bringing back an old post up again to answer this

The answers to above questions can be found here: http://blogs.msdn.com/b/streamanalytics/archive/2015/06/29/intro-to-diagnostics-for-azure-stream-analytics.aspx

↧

Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

February 6, 2016, 7:18 am

≫ Next: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

≪ Previous: Why did my Azure Stream Analytics services get into "Degraded" status?

Question:

Error: Stream Analytics job has validation errors: The output output used in the query was not defined. Activity Id: ‘activity id string’.

Here is my ASA query:

Select
CAST(Text As nvarchar(max)) as text,
Cast (CreatedAt as datetime) as createdat,
(case Cast(sentimentscore as nvarchar(max))
when 2 then 0
when 0 then -10
when 4 then 10 else sentimentscore
end) as sentimentscore,

(case CAST(topic as nvarchar(max))
when ‘XBox’ then ‘Xbox’
else topic end) as topic,

(case CAST(sentimentscore AS nvarchar(max))
when ‘0’ then ‘Bad :(‘
when ‘2’ then ‘Neutral’
when ‘4’ then ‘Good :)’ else sentimentscore end) as sentiment
from TwitterSteam

Solution by our team member Zhong Chen:

You may have a message in your ops log similar to the following message. Often we have noticed people have trouble locating relevant logs when job fails. We are working on improving that experience. Meanwhile, please carefully sift through the Error logs in the relevant time range.
{“Message Time”:”2016-02-02 22:12:00Z”,”Error”:”Comparison is not allowed for operands of type ‘nvarchar(max)’ and ‘bigint’ in expression ‘case Cast ( sentimentscore as nvarchar ( max ) ) when 2 then 0 when 0 then – 10 when 4 then 10 else sentimentscore end’.\u000d\u000a”,”Message”:”Runtime exception occurred while processing events, Comparison is not allowed for operands of type ‘nvarchar(max)’ and ‘bigint’ in expression ‘case Cast ( sentimentscore as nvarchar ( max ) ) when 2 then 0 when 0 then – 10 when 4 then 10 else sentimentscore end’.\u000d\u000a, : OutputSourceAlias:twitter2;”,”Type”:”SqlRuntimeError”,”Correlation ID”:”connection id string”}

If you have questions like this one, please reach out to us on twitter : @AzureStreaming

↧

Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

February 16, 2016, 6:07 am

≫ Next: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

≪ Previous: Why aren’t my ASA job results showing up in Azure SQL Server table? How do I debug?

In typical Internet of Things (IoT) scenarios, you have devices that you can program to push data to Azure, either to an Azure Event Hub or an IoT hub. Both of those hubs are entry points into Azure for storing, analyzing, and visualizing with a myriad of tools made available on Microsoft Azure. However, they both require that you push data to them, formatted as JSON and secured in specific ways. This brings up the following question. What do you do if you want to bring in data from either public or private sources where the data is exposed as a web service or feed of some sort, but you do not have the ability to change how the data is published? Consider the weather, or traffic, or stock quotes – you can’t tell NOAA, or WSDOT, or NASDAQ to configure a push to your Event Hub. To solve this problem, Spyros Sakellariadis and Dinar Gainitdinov have written and open-sourced a small cloud sample that you can modify and deploy that will pull the data from some such source and push it to your Event Hub. From there, you can do whatever you want with it, subject, of course, to the license terms from the producer. You can find the application here.

Check out the technical documentation and the GenericWebToEH solution code to try it out on some of your favorite data feeds.

↧

Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

February 16, 2016, 6:07 am

≫ Next: Handling Json array in Stream Analytics Query

≪ Previous: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

In typical Internet of Things (IoT) scenarios, you have devices that you can program to push data to Azure, either to an Azure Event Hub or an IoT hub. Both of those hubs are entry points into Azure for storing, analyzing, and visualizing with a myriad of tools made available on Microsoft Azure. However, they both require that you push data to them, formatted as JSON and secured in specific ways. This brings up the following question. What do you do if you want to bring in data from either public or private sources where the data is exposed as a web service or feed of some sort, but you do not have the ability to change how the data is published? Consider the weather, or traffic, or stock quotes – you can’t tell NOAA, or WSDOT, or NASDAQ to configure a push to your Event Hub. To solve this problem, Spyros Sakellariadis and Dinar Gainitdinov have written and open-sourced a small cloud sample that you can modify and deploy that will pull the data from some such source and push it to your Event Hub. From there, you can do whatever you want with it, subject, of course, to the license terms from the producer. You can find the application here.

Check out the technical documentation and the GenericWebToEH solution code to try it out on some of your favorite data feeds.

↧

Handling Json array in Stream Analytics Query

February 23, 2016, 12:31 am

≫ Next: How to deal with missing events in streaming data?

≪ Previous: Pulling data from either public or private sources to Azure Event Hub where the data is exposed as a web service or feed

This week, I am sharing a query question asked in stackoverflow to illustrate how to handle json array in Stream Analytics Query:

Problem

I retrieve some weatherdata from an external API. This is returned as JSON and send to an Azure IoT hub. Stream analytics processes the json into a proper format, but I got a problem here.

Under here is the complete data

{

“deviceId”:“aNewDevice”,

“data”:

{“data”:

{“current_condition”:[

{

“cloudcover”:“0”,

“FeelsLikeC”:“0”,

“FeelsLikeF”:“32”,

“humidity”:“100”,

“observation_time”:“10:00 AM”,

“precipMM”:“0.0”,

“pressure”:“1020”,

“temp_C”:“2”,

“temp_F”:“36”,

“visibility”:“0”,

“weatherCode”:“143”,

“weatherDesc”:[{“value”:“Fog, Mist”}],

“weatherIconUrl”:[{“value”:“http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0006_mist.png”}],

“winddir16Point”:“SSW”,

“winddirDegree”:“210”,

“windspeedKmph”:“7”,

“windspeedMiles”:“4”

}

“request”:[

{

“query”:“Nijmegen, Netherlands”,

“type”:“City”

}

]

}

Solution:

You need to use GetArrayElement function. For example:

SELECT GetRecordProperty(GetArrayElement(Current_Condition,0),'humidity')

To make it a bit nicer you can split query into 2 steps:

WITH CurrentConditions AS

SELECT

deviceId,

GetArrayElement(Current_Condition,0)as conditions

FROM input

SELECT

deviceID,

conditions.humidity

FROM CurrentConditions

↧

How to deal with missing events in streaming data?

February 25, 2016, 2:14 am

≫ Next: How to configure Azure Stream Analytics outputs to skip events that cannot be written due to conversion errors or schema mismatch

≪ Previous: Handling Json array in Stream Analytics Query

This is easy with Hopping Window:

SELECT

System.Timestamp AS windowEnd,

TopOne() OVER (ORDER BY time DESC) AS lastEvent

FROM

input TIMESTAMP BY T

GROUP BY HOPPINGWINDOW(second, 300, 5)

Please check more query examples at Query examples for common Stream Analytics usage patterns

↧

How to configure Azure Stream Analytics outputs to skip events that cannot be written due to conversion errors or schema mismatch

March 10, 2016, 2:30 am

≫ Next: Integration with Azure Data Lake Store

≪ Previous: How to deal with missing events in streaming data?

By default, when output events cannot be written to the external storage due to being non-conforming data (null columns when destination column is non-nullable, longer strings than the destination column support, values with types that cannot be written in the output etc.) they cause the job to stop processing new events and continuously retry the operation of converting/inserting the non-conformant value. This may not be always desirable, and thus we have recently added an option to instead drop such events. The feature is not yet available via the portal, but can be used via Power Shell following these instructions:

Use PowerShell to retrieve the job (https://msdn.microsoft.com/en-us/library/azure/mt603472.aspx)
Add the optional property outputErrorPolicy with the value “drop” as in the example below
Usage of outputErrorPolicy property that needs to be configured to instruct the job to drop non-conforming events at the output is shown in the example below:
Example:

{

“location”:
” West US “,

“properties”: {

“sku”: {

“name”: “standard”

“eventsLateArrivalMaxDelayInSeconds”: 10,
“outputErrorPolicy”: “drop”,

5. Update the job using PowerShell (https://msdn.microsoft.com/en-us/library/azure/mt603479.aspx).

We are working on releasing this functionality in portal soon. In the meantime this solution can help with resolving the issue in case you come across it.

↧

Integration with Azure Data Lake Store

March 14, 2016, 9:34 am

≫ Next: SQL Data Warehouse as output of Azure Stream Analytics

≪ Previous: How to configure Azure Stream Analytics outputs to skip events that cannot be written due to conversion errors or schema mismatch

We are excited to announce that Azure Stream Analytics can output to Azure Data Lake Store, a hyper-scale repository for big data analytics workloads.

This integration further advances the ease of enablement of a Lambda architecture where the same data that is subjected to real time stream analytics is also stored and then subjected to offline batch processing to unlock powerful insights. A large number of batch processing possibilities are enabled by Data Lake Store’s integration with Azure Data Lake Analytics, Azure HDInsight, upcoming integrations with Microsoft Revolution-R Enterprise, and Hadoop distributions from various industry-leading providers.

Azure Data Lake Store is built for supporting the storage needs of big data analytics systems that require massive throughput to query and analyze petabytes of data. It will be highly useful as data to be analyzed continues to grow exponentially, especially in streaming scenarios such as IoT.

The ability to output to Azure Data Lake Store from Stream Analytics can be enabled by using the respective Stream Analytics output choice as shown below:

When Data Lake Store is selected as an output in the Azure Management portal, it is necessary to authorize the usage of an existing Data Lake Store. Details on authorizing and configuring Data Lake Store can be found here.

At this time, the creation and configuration of Data Lake Store outputs is supported only in the Azure Classic Portal.

- Sam Chandrashekar, Program Manager

↧

SQL Data Warehouse as output of Azure Stream Analytics

March 18, 2016, 5:06 pm

≫ Next: Troubleshooting Azure Stream Analytics jobs on New Portal

≪ Previous: Integration with Azure Data Lake Store

Several of our customers have asked whether we support Azure SQL Data Warehouse an output sink for of an Azure Steam Analytics job. The answer is we do: it can be configured by choosing “SQL DATABASE” output option, as shown below:

The SQL Data Warehouse documentation also contains step-by-step instructions on configuring Azure SQL Data Warehouse as an output of an Azure Stream Analytics job.

↧

Troubleshooting Azure Stream Analytics jobs on New Portal

April 17, 2016, 10:42 pm

≫ Next: Troubleshooting Azure Stream Analytics jobs with SELECT INTO

≪ Previous: SQL Data Warehouse as output of Azure Stream Analytics

We are working hard to bring you an end-to-end Stream Analytics experience in the new Azure portal and we are almost there! Now you can perform more troubleshooting tasks on portal.azure.com.

Input and output diagnosis: A yellow triangle warning sign on your input or output indicates something is wrong. Clicking on the warning sign will open the input/output property view and you can check out the error message.

You may also want to go to the job “Settings”  “Audit logs” to see more detailed logs.

Still having no clue? Click on “New support request” in the Setting blade and open a support ticket. Azure support team will contact you shortly and help out.

You probably noticed the new “Error policy” setting we added recently. Now you can configure how your job handles output errors: drop the output data if write to destination fails, or retry until succeed. The default behavior is retrying on output errors, but if your scenario is safe to ignore some malformed events then “drop” option will increase robustness of your stream processing pipeline.

By the way, going forward new features like Error Policy will be available only on the new Azure Portal. It’s time to get familiar with ASA’s new portal experience!

↧

Troubleshooting Azure Stream Analytics jobs with SELECT INTO

May 13, 2016, 4:51 pm

≫ Next: Spark Streaming and Azure Stream Analytics

≪ Previous: Troubleshooting Azure Stream Analytics jobs on New Portal

Azure Stream Analytics is a fully managed service to do real time processing of data with a flexible SQL-like language. You can easily construct queries to complete complex analysis. However, it also means sometimes the system can be hard to troubleshoot when it does not run as expected. Here is one trick that can help troubleshooting Azure Stream Analytics jobs.

The SELECT INTO statement

Sometimes knowing what the data looks like in the middle of the query can be very helpful. Since inputs or steps of an Azure Stream Analytics job can be read multiple times, we can write extra SELECT INTO statements to output intermediate data into storage and inspect the correctness of the data, just like “watch variables” when debugging a program. Let’s look at an example.

Example

Here we have a simple Azure Stream Analytics job. It has one stream input, two reference data inputs and an output to Azure Table Storage.

This query joins data from the event hub and two reference blobs to get the name and category information:

This job is running fine. However, no events are being produced in the output. From the Monitoring tile we can see that input is producing data, but we don’t know which step of the JOIN caused all the events to be dropped.

In this situation, we can add a few extra SELECT INTO statements to “log” the intermediate JOIN results as well as the data read from the input.

Let’s first add two new “temporary outputs”. They can be any sink you like. Here we use Azure Storage as an example:

Then let’s rewrite the query like this:

Now start the job again and let it run for a few minutes. Then we can query temp1 and temp2 with Visual Studio Cloud Explorer:

As we can see, temp1 and temp2 both have data, and the name column is populated correctly in temp2. However, there is still no data in output:

Now we are almost certain that the issue is with the 2^nd JOIN. Let’s download the reference data from blob and take a look:

Aha! The format of GUID in this reference data is different from the format of the [from] column in temp2. That’s why our data wasn’t arriving in output1 as expected. Let’s fix the data format, upload to reference blob and try again:

And then we got the data in output with name and category nicely populated!

Conclusion

As you can see, this trick is very helpful in troubleshooting Azure Stream Analytics jobs. Beyond the scenario we showed in the example, it can also be used to troubleshoot issues when wrong data are produced by a step. Just use the “SELECT INTO” statement and you can troubleshoot your jobs like debugging a program!

↧