StackOverflow

### Answer rating: 174

### Answer rating: 72

I want to write a function that randomly picks elements from a training set, based on the **bin probabilities** provided. I **divide the set indices to 11 bins**, then create **custom probabilities** for them.

```
bin_probs = [0.5, 0.3, 0.15, 0.04, 0.0025, 0.0025, 0.001, 0.001, 0.001, 0.001, 0.001]
X_train = list(range(2000000))
train_probs = bin_probs * int(len(X_train) / len(bin_probs)) # extend probabilities across bin elements
train_probs.extend([0.001]*(len(X_train) - len(train_probs))) # a small fix to match number of elements
train_probs = train_probs/np.sum(train_probs) # normalize
indices = np.random.choice(range(len(X_train)), replace=False, size=50000, p=train_probs)
out_images = X_train[indices.astype(int)] # this is where I get the error
```

I get the following error:

```
TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array
```

I find this weird, since I already checked the array of indices that I have created. It is **1-D**, it is **integer**, and it is **scalar**.

What am I missing?

Note : I tried to pass `indices`

with `astype(int)`

. Same error.

Perhaps the error message is somewhat misleading, but the gist is that `X_train`

is a list, not a numpy array. You cannot use array indexing on it. Make it an array first:

```
out_images = np.array(X_train)[indices.astype(int)]
```

I get this error whenever I use `np.concatenate`

the wrong way:

```
>>> a = np.eye(2)
>>> np.concatenate(a, a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 6, in concatenate
TypeError: only integer scalar arrays can be converted to a scalar index
```

The correct way is to input the two arrays as a tuple:

```
>>> np.concatenate((a, a))
array([[1., 0.],
[0., 1.],
[1., 0.],
[0., 1.]])
```

The rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these contin...

10/07/2020

The series “Studies in Big Data” (SBD) publishes new developments and advances in the various areas of Big Data-quickly and with a high quality. The intent is to cover the theory, research, develo...

10/07/2020

Data scientist has been called “the sexiest job of the 21st century,” presumably by someone who has never visited a fire station. Nonetheless, 23/09/2020

The field of Artificial Intelligence (AI), which can definitely be considered to be the parent field of deep learning, has a rich history going back to 1950. While we will not cover this history in mu...

23/09/2020

X
# Submit new EBook