Archive for the ‘Uncategorized’ Category

Windows 10 on ARM is looking good

May 26, 2018



  • WoW64


Transfer Learning sample from Microsoft AI framework – CNTK

March 27, 2018

nsfer learning is really a great way to save on resources by transfer of learning from an existing model. WMore info here:

Transfer Learning is a useful technique when, for instance, you know you need to classify incoming images into different categories, but you do not have enough data to train a Deep Neural Network (DNN) from scratch. Training DNNs takes a lot of data, all of it labeled, and often you will not have that kind of data on hand. If your problem is similar to one for which a network has already been trained, though, you can use Transfer Learning to modify that network to your problem with a fraction of the labeled images (we are talking tens instead of thousands).

What is Transfer Learning?

With Transfer Learning, we use an existing trained model and adapt it to our own problem. We are essentially building upon the features and concepts that were learned during the training of the base model. With a Convolutional DNN (ResNet_18 in this case), we are using the features learned from ImageNet data and cutting off the final classification layer, replacing it with a new dense layer that will predict the class labels of our new domain.

The input to the old and the new prediction layer is the same, we simply reuse the trained features. Then we train this modified network, either only the new weights of the new prediction layer or all weights of the entire network.

This can be used, for instance, when we have a small set of images that are in a similar domain to an existing trained model. Training a Deep Neural Network from scratch requires tens of thousands of images, but training one that has already learned features in the domain you are adapting it to requires far fewer.

In our case, this means adapting a network trained on ImageNet images (dogs, cats, birds, etc.) to flowers, or sheep/wolves. However, Transfer Learning has also been successfully used to adapt existing neural models for translation, speech synthesis, and many other domains – it is a convenient way to bootstrap your learning process.

Here is an example:



CNTK contains a transfer learning sample. I have a machine with Nvidia GPU GeForce GTX 960M

Here is what Happens when I start this sample with python:

Activate CNTK

C:\CNTK-Samples-2-4\Examples\Image\TransferLearning>conda activate cntk-py36

(cntk-py36) C:\CNTK-Samples-2-4\Examples\Image\TransferLearning>


(cntk-py36) C:\CNTK-Samples-2-4\Examples\Image\TransferLearning>python
Traceback (most recent call last):
File “”, line 9, in <module>
import cntk as C
ModuleNotFoundError: No module named ‘cntk’


This gets resolved by picking python from the anaconda.


cntk-py36) C:\Users\Dell\Downloads\ethereum-mining-windows>where python

(cntk-py36) C:\CNTK-Samples-2-4\Examples\Image\TransferLearning>C:\Users\Dell\Anaconda3\python.exe


Selected GPU[0] GeForce GTX 960M as the process wide default device.
Build info:

Built time: Jan 31 2018 14:57:35
 Last modified date: Wed Jan 31 01:10:27 2018
 Build type: Release
 Build target: GPU
 With 1bit-SGD: no
 With ASGD: yes
 Math lib: mkl
 CUDA version: 9.0.0
 CUDNN version: 7.0.5
 Build Branch: HEAD
 Build SHA1: a70455c7abe76596853f8e6a77a4d6de1e3ba76e
 MPI distribution: Microsoft MPI
 MPI version: 7.0.12437.6
Training transfer learning model for 20 epochs (epoch_size = 6149).
Training 15949478 parameters in 68 parameter tensors.
CUDA failure 2: out of memory ; GPU=0 ; hostname=DESKTOP-IA3HLGI ; expr=cudaMalloc((void**) &deviceBufferPtr, sizeof(AllocatedElemType) * AsMultipleOf(numElements, 2))
Traceback (most recent call last):
 File "", line 217, in <module>
 max_epochs, freeze=freeze_weights)
 File "", line 130, in train_model
 trainer.train_minibatch(data) # update model with it
 File "C:\Users\Dell\Anaconda3\lib\site-packages\cntk\train\", line 181, in train_minibatch
 arguments, device)
 File "C:\Users\Dell\Anaconda3\lib\site-packages\cntk\", line 2975, in train_minibatch_overload_for_minibatchdata
 return _cntk_py.Trainer_train_minibatch_overload_for_minibatchdata(self, *args)
RuntimeError: CUDA failure 2: out of memory ; GPU=0 ; hostname=DESKTOP-IA3HLGI ; expr=cudaMalloc((void**) &deviceBufferPtr, sizeof(AllocatedElemType) * AsMultipleOf(numElements, 2))

 > Microsoft::MSR::CNTK::CudaTimer:: Stop
 - Microsoft::MSR::CNTK::CudaTimer:: Stop (x2)
 - Microsoft::MSR::CNTK::GPUMatrix<float>:: Resize
 - Microsoft::MSR::CNTK::Matrix<float>:: Resize
 - std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>
 - std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>:: shared_from_this (x3)
 - CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD
 - CNTK:: CreateTrainer
 - CNTK::Trainer:: TotalNumberOfUnitsSeen
 - CNTK::Trainer:: TrainMinibatch (x2)
 - PyInit__cntk_py (x2)

I run into a problem with memory. My graphics card has a dedicated memory of 2Gb  as you can see below:

Apparently that isn’t enough. Next question I have is does CUDA use dedicated GPU memory or shared system memory? As my graphics card has 3.9 GB of that:

So my total Graphics memory is 5.9 GB


It turns out that I am running out of Dedicated GPU memory not shared system memory.

I couldn’t find nany GPU setting but I tied reducing the mini batch size and that the trick. In the  source code on line 41 and changing the the variable as ‘mb_size = 50’ to ‘mb_size = 30’ did the trick. Your mileage may vary based on  your GPU so do experiment.

AI next conference Jan. 2018 Seattle WA

February 6, 2018

The  NextCon conference series  began  early last year. The conference was organized by Association of Technology and Innovation (ATI) ‘s and Bill Liu was the lead.  This was in succession to the last one in Seattle in March 2017. The conference had  loads of companies which sponsored and had tech. talks. Roughly around 400 people attended the conference.

Conference Schedule:





Quick summary of recently concluded AI conference Nextcon in Seattle, Jan 2018.   This summary has my key takeaways and my learnings, Keynote summaries,  some of the break-away sessions I attended summaries, my side discussions as well as links and resources to the tech talks as well as links to blogs and Videos. It also includes some of my follow-up blog reading to understand  concepts. Hope this is useful for you as  much I found it to be!!


The conference had 4 tracks:

  1. Computer Vision
  2. Speech /NLP
  3. Data Science/Analytics
  4. Machine Learning

With limited time I picked mainly the Data Science and Machine learning tracks to understand trends on how to handle large amount of data and how to make sense of large amounts of data.


  1. Key takeaways & learnings for us:

These are some of things I have distilled and filtered from the Conference as areas of interest.

  1. AI is a great tool to have in your tool box. It isn’t the end all of all tools at least for now. But this could change in the future. For eg: AI which can disambiguate and find flaws in the electronic welds cant tell you whether a Kid is holding a tooth brush or base ball bat.
  2. Reinforcement learning(RL) is making a come back and yielding great results thought at a slightly higher cost of latency etc., time and infrastructure costs. Martin  Gomer  from google showed that how he trained a Pong playing AI with just historic data and by making it play itself and generating lot of data and get better at it. Think of how a kid learns to bike or learns to walk …ai_nvidiaReinforcement  learning (RL) and Neural networks Neural networks are algorithms, RL is a problem type. You can approach RL with neural networks.What makes RL very different from the others is that you typically don’t have a lot of data to start with, but you can generate a lot of data by playing. You have to deal with the problem that you have to make decisions, but it is not clear what is good (delayed reward). For example, it might take several moves until you know in Go if a move was smart.3. OS for AI – The next frontier will be when people will use each others algorithms and models to come up with a sophisticated service which aggregates. For eg: John peck from Algortigmia shows how. Someone writes a fruit classifier and another persona vegetable classifier and then a third party could aggregate them into a fruit or vegetable classifier.composability

elastic scale

Algorithmia maybe a good resource for paying for ML algorithms. I suggested to them after the talk to support  offering data as well at a $$.  This is inline with Data Science as a service idea

4. Auto ML or off-the-shelf machine learning methods – Machine learning is evolving at a pretty strong pace. More and more it is possible to just feed the AI platform dataset and it tunes the hyperparameters and comes up with a trained model

5. In the large scale of things AI is currently pretty early in its evolution


aiishere6. Future of AI from Prof. Oren Otzioni  – When will Superintelligence Arrive? AI experts try and answer  the question. It still is far out!!

7. Another interesting talk was by Twitter on Online ML and why they didn’t use deep learning. Deep learning currently has some disadvantages especially in real time low latency scenarios. More details on this below

8. Deep learning is providing lot of value however it comes at a cost as it requires a large data set. It however does require a solid hardware infrastructure.  Unfortunately, in deep learning, people usually see very sublinear speedups from many GPUs. Top performance thus requires top-of-the-line GPUs.

9. Microsoft AI platform is super rich in terms of tools, services, 3rd party tools integrations etc.

Microsoft demoed Azure ML workbench which seems like a really cool tool for the time consuming activity  of data wrangling.


2. Conference KeyNote summary:

  1. Steve Guggenheimer from Microsoft

Steve talked about the Microsoft AI platform and applications already on a lot of features.


Microsoft AI platform-


Microsoft demoed Azure ML workbench which seems like a really cool tool for the time consuming activity  of data wrangling.



The platform is super rich in terms of tools, services, 3rd party tools integrations etc.

Ethics in AI

Microsoft realizes the potential of AI and how it can be misused and hence Steve shared the Microsoft AI ethics. Satya has talked about compassionate AI  as  the AI for the future

Microsoft has published a nice book on this subject  called “The Future Computed”

ai ethics msft

I also liked the live demo on how the Bing  team uses specialized FPGA’s. FPGA’s or Field Programmable Gate Arrays are programmable hardware devices sort of a CPU for specific task rather than general purpose which allows optimizations to be built in.


CPU vs FPGA performance within Bing team – FPGAs and ASIC derivatives just dedicated to a certain task perform really really well at the same time taking a fraction of the Power.


2. AI at DIdi Chuxing – Didi Chuxing is like the Uber of China and the scale they have to deal with is humongous.   I liked DIdi Chuxing’s Presentation on how they are using AI in the transportation sector.  Lot of it can be applied to other fields as well as the problems are similar in nature. They  presented the iterations  on how they solved their problems using various AI algorithms and have narrowed it down to deep learning and Reinforcement learning to look at forecast, ETA, dispute resolution etc.  They started with  regression models to Deep learning models. Deep learning has helped them solve more problems.


They have applied AI to multiple problems areas within transportation –


More details here:

 3. UW Prof. Oren Etzioni also presented a good deck on Future of AI which is more like Is AI the evil power it is made out to be rather than typical technical trends of AI?


Winograd schemas is an alternative to the Turing Test developed by Hector Levesque.

The Turing Test is intended to serve as a test of whether a machine has achieved human-level intelligence. In one of its best-known versions , a person attempts to determine whether he or she is conversing (via text) with a human or a machine. However, it has been criticized as being inadequate. At its core, the Turing Test measures a human’s ability to judge deception: Can a machine fool a human into thinking that it too is human? It also suggests that the Turing Test may not be an ideal way to judge a machine’s intelligence.  An alternative is the Winograd Schema Challenge.

Rather than base the test on the sort of short free-form conversation suggested by the Turing Test, the Winograd Schema Challenge (WSC) poses a set of multiple-choice questions that have a particular form.  The test is dedicated to furthering and promoting research in the field of formal commonsense reasoning. For eg:

4. “Tensorflow and deep reinforcement learning without a PhD“ by Martin Gomer  from google.

He briefly alluded to Auto ML which learns the model architecture directly on the dataset of interest:


Google deep mind teaching a virtual human to walk/jump etc I was an athlete and when I look at the image below – The stride and the arms are just what a good long jumper would use and would be proud off below 🙂


Demonstration of playing  pong without any specialized algorithms with deep reinforcement leaning and lots of data:

 Link to the Video of his talk: –

Google deep mind taught itself to walk –


5. Key Note – “Deep learning at amazon Alexa” by Nikko Strom from Amazon

This is very powerful as it shows how Alexa is using multi modality, device and personal context which Alexa is using. This is really very powerful and can really engage the user!!

3. Summary of Breakout sessions (I attended)

  1. ML track – Twitter – Parameter Server approach for online ML at Twitter

The talk basically discussed the evolution of parameters servers in Twitter which need to scale and have real time approaches to online ML. Their approach has been around load balancing,  filtering, centralized Parameter servers. They have tried deep learning but they found as of now Deep Learning is not working for them:

Some of the disadvantages of Deep learning:

  1. Latency for their usage is high
  2. Model quality not impacted much ROI
  3. New approaches in ML could remove displace deep learning


  1. ML track – Machine learning at scale by Amy Unruh from Google The talk showed that there is a gap in the Google ML offering and is addressed by Auto ML for vision. Also, compares the various techniques in terms of resources needed to solve an AI problem typically:

1)      Time

2)      Prediction Code

3)      Serving Infrastructure

4)      Model Code

5) Training data


Resources needed to solve an AI problem  per Google


ML  as an API  – Mainly time and prediction code


Custom code and model – More resource intense

custom build

Custom model with transfer of learning from another project  – It takes less time and can reuse model code and training data

transfer learning

Google has identified Gap in the continuum  from DIY ML to ML APIs


They are trying to address it with Auto ML. This is currently limited to Vision API only.

cloud automl

Auto ML – Currently Google has it  only  for Vision API but allows for deep Neural networks  to be auto generated.

It allows savings on Model code etc, infrastructure etc


Only need to provide training data. It trains, deploys, creates a Neural network automatically.


Under the hood is creating new Neural network layers automatically.

Content below came from Martin Gomer  Google keynote speakers with the talk  titled “Tensorflow and deep reinforcement learning without a PhD


He briefly alluded to Auto ML which learns the model architecture directly on the dataset of interest


More details on Auto ML  here:

 3. Deep multimodal intelligence by Xiaodong He from Microsoft –

Microsoft Research Xiaodong He described  the scene with natural language :

  1. Understanding the image’s content
  2. Reasoning relationships among objects & concepts
  3. Generate a story in natural language

However, true understanding of the world is much more challenging


There were quite a few other parallel talks  but time was limited 😦

4. Presentation Tid- Bits

Slight digression on a nice presentation tool I found some speakers use at the conference which compliments the laser pointer. It retails for  around $130


And is great for highlighting code etc.


5.  Links /References



Papers — Lots of good papers on KDD:







How to Kernel Debug Connected Standby/Modern Standby systems?

October 24, 2016


Debugging a Modern Standby (Connected Standby earlier) scenario can be challenging as there are some smaller subtle things to keep in mind. Most modern standby/connected Standby systems are newer systems with USB 3.0 xHCI controllers so this blog post only focuses on systems which support USB 3.0 debugging.

What you need:

  1. USB 3 cable –
  2. USB Type C to type A adapter – Needed only if the device doesn’t have a USB Type A port
  3. Windbg Bits – Many sources including the Kits —  WDK or ADK

Methodology to setup Kernel Mode debugging

  1. Setup the machine for USB 3.0 debug  as mentioned here:
  2. Make sure you Disable Secure Boot in the BIOS menu
  3. Hook up the cable as follows  setup
  4. Check the USB device hierarchy and turn  off all the components. You can do this from device manager, usb tools usb_hierarcy
  5. Disable Turning off  USB stack components – Hubs and controllers on target Disable Powersaving on USb controller For the uSB hub, Uncheck the box to allow the computer to turn off the debice to save power  usb_hub_power
  6. Disable Powersaving on USB HUB/s — For the uSB xHCXI controller, Uncheck the box to allow the computer to turn off the device to save power usb_hub_power
  7. If there are multiple controllers or Hubs make sure you pick the right one where you plan to debug . Also if there is another level of hub in between do the same for that as well.
  8. Debug away!!


Decrypting wdf01000.sys interrupts with WPA

June 13, 2014

If you are trying to figure out which WDF driver is the source of all interrupts there is a way out. Since wdf01000.sys fields all interrupts and then calls the actual driver, it is difficult to figure out which driver caused the interrupts. Fortunately, there is a way out: You need to use the trace flags in the kernel – WDF_INTERRUPT and WDF_DPC. You can find all Kernel trace flags by the following command: “xperf -providers KF” .

You can trace as follows: ”

xperf -on diageasy+WDF_DPC+WDF_INTERRUPT+0x48000000+PROC_THREAD+LOADER+INTERRUPT+DPC+CSWITCH+TIMER+CLOCKINT -stackwalk TimerSetPeriodic+TimerSetOneShot+CSwitch+readythread+profile -clocktype perfcounter -buffersize 1024 -minbuffers 1024

xperf -d test.etl